This is an internal implementation file of the hash map. Users of the hash map should not rely on the contents of this file.
File contents: Definition of all operations on Raw₀, definition of WFImp.
Hash map implementation notes #
This is a simple separate-chaining hash table. The data of the hash map (DHashMap.Raw) consists of
a cached size and an array of buckets, where each bucket is an AssocList α β (which is the same as
a List ((a : α) × β a) but with one less level of indirection). The number of buckets is always a
power of two. The hash map doubles its size upon inserting an element such that the number of
elements is more than 75% of the number of buckets.
Because we need DHashMap.Raw to be nested-inductive-safe, we cannot bundle the fact that there is
at least one bucket. We there for define a type Raw₀ which is just a Raw together with the fact
that the size is positive. Almost all internal work on the hash map happens on Raw₀ so that we do
not have to perform size check all of the time. The operations defined on Raw perform this size
check once to transform the Raw into a Raw₀ and then operate on that (therefore, each operation
on Raw will only do a single size check). The operations defined on DHashMap conclude that the
size is positive from the well-formedness predicate, use that to build a Raw₀ and then operate on
that. So the operations on DHashMap are exactly the same as the operations on Raw₀ and the
operations on Raw are the same as the operations on Raw₀ as long as we can prove that the size
check will succeed.
The operations on Raw₀ are defined in this file. They are built for performance. The IR is
hand-checked to ensure that there are no unnecessary allocations, inlining problems or linearity
bugs. Note that for many operations the IR only becomes meaningful once it has been specialized with
concrete Hashable and BEq instances as is the case in actual applications.
Of the internal files, only Internal.Index, Internal.Defs and Internal.AssocList.Basic contain
any executable code. The rest of the files set up the verification framework which is described in
the remainder of this text.
The basic idea is to translate statements about hash maps into statements about lists using the
function toListModel defined in this file. The function toListModel simply concatenates all
buckets into a List ((a : α) × β a). The file Internal.List.Associative then contains a complete
verification of associative lists. The theorems relating the operations on Raw₀ to the operations
on List ((a : α) × β a) are located in Internal.WF and have such names as
contains_eq_containsKey or toListModel_insert. In the file Internal.RawLemmas we then state
all of the lemmas for Raw₀ and use a tactic to apply the results from Internal.WF to reduce to
the results from Internal.List.Associative. From there we can state the actual lemmas for
DHashMap.Raw, DHashMap, HashMap.Raw, HashMap, HashSet.Raw and HashSet in the
non-internal *.Lemmas files and immediately reduce them to the results about Raw₀ in
Internal.RawLemmas.
There are some additional indirections to this high-level strategy. First, we have an additional
layer of so-called "model functions" on Raw₀, defined in the file Internal.Model. These have the
same signature as their counterparts defined in this file, but may have a slightly simpler
implementation. For example, Raw₀.erase has a linearity optimization which is not present in the
model function Raw₀.eraseₘ. We prove that the functions are equal to their model implementations
in Internal.Model, then verify the model implementation. This makes the verification more robust
against implementation details, future performance improvements, etc.
Second, reducing hash maps to lists only works if the hash map is well-formed. Our internal
well-formedness predicate is called Raw.WFImp (defined in this file) and states that (a) each
bucket only contains items with the correct hash, (b) the cached size is equal to the actual number
of elements in the buckets, and (c) after concatenating the buckets the keys in the resulting list
are pairwise not equal. The third condition is a priori stronger than the slightly more natural
condition that the keys in each bucket are pairwise not equal, but they are equivalent in
well-behaved cases and our condition works better. The user-visible well-formedness predicate
Raw.WF is equivalent to WFImp, as is shown in Internal.WF. The user-visible version exists to
postpone the proofs that all operations preserve well-formedness to a later file so that it is
possible to import DHashMap.Basic without pulling in all of the proof machinery.
The framework works very hard to make adding and verifying new operations very easy and
maintainable. To this end, we provide theorems apply_bucket, apply_bucket_with_proof,
toListModel_updateBucket and toListModel_updateAllBuckets, which do all of the heavy lifting in
a general way. The verification for each actual operation in Internal.WF is then extremely
straightforward, requiring only to plug in some results about lists. See for example the functions
containsₘ_eq_containsKey and the section on eraseₘ for prototypical examples of this technique.
Here is a summary of the steps required to add and verify a new operation:
- Write the executable implementation
- Implement the operation
AssocList.operationon associative lists inInternal.AssocList.Basic - Implement the operation
Raw₀.operationonRaw₀inInternal.Defs - Implement the operation
Raw.operation/DHashMap.operationonDHashMap.RawandDHashMapinDHashMap.Basic. If your operation modifies the hash map, this will involve adding a new constructoroperation₀toRaw.WF. In that case, don't forget to provide a well-formedness lemmaRaw.WF.operation(which differs fromRaw.WF.operation₀in that it is about the operation onRaw, not onRaw₀(remember, these differ by a bounds check)). - Implement the operation
Raw.operation/HashMap.operationonHashMap.RawandHashMapinHashMap.Basic. - Implement the operation
Raw.operation/HashSet.operationonHashSet.RawandHashSetinHashSet.Basic
- Write the list implementation
- Implement the operation
List.operationon lists inInternal.List.Associative - Connect the implementation on lists and associative lists in
Internal.AssocList.Lemmasvia a lemmaAssocList.operation_eq.
- Write the model implementation
- Write the model implementation
Raw₀.operationₘinInternal.Model - Prove that the model implementation is equal to the actual implementation in
Internal.Modelvia a lemmaoperation_eq_operationₘ.
- Verify the model implementation
- In
Internal.WF, proveoperationₘ_eq_List.operation(for access operations) orwfImp_operationₘandtoListModel_operationₘ - Immediately deduce the corresponding results
operation_eq_List.operationorwfImp_operation/toListModel_operationby combining the results from steps 3 and 4. - If you added a new constructor to
Raw.WFearlier, fix up the proof ofRaw.WF.out.
- Connect
RawandRaw₀
- Write
Raw.operation_eqandRaw.operation_valinInternal.Raw.
- Prove
Raw₀versions of user-facing lemmas
- State all of the user-facing lemmas that you want in
Internal.RawLemmas. This will generally involve connecting the operation to ALL existing operations or deciding that there is nothing to be said about a certain pair. - Prove the corresponding results about lists in
List.Associative - Use the tactic provided in
Internal.RawLemmasto deduce theRaw₀results from the list results. You will need to hook some of the results you proved in step 4 into the tactic. You might also have to prove that your list operation is invariant under permutation and add that to the tactic.
- State and prove the user-facing lemmas
- Restate all of your lemmas for
DHashMap.RawinDHashMap.RawLemmasand prove them using the provided tactic after hooking in youroperation_eqandoperation_valfrom step 5. - Restate all of your lemmas for
DHashMapinDHashMap.Lemmasand prove them by reducing toRaw₀. - Restate all of your lemmas for
HashMap.RawinHashMap.RawLemmasand prove them by reducing toDHashMap.Raw. - Restate all of your lemmas for
HashMapinHashMap.Lemmasand prove them by reducing toDHashMap. - Restate all of your lemmas for
HashSet.RawinHashSet.RawLemmasand prove them by reducing toHashMap.Raw. - Restate all of your lemmas for
HashSetinHashSet.Lemmasand prove them by reducing toHashMap.
This sounds like a lot of work (and it is if you have to add a lot of user-facing lemmas), but the framework is set up in such a way that each step is really easy and the proofs are all really short and clean.
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.computeSize buckets = Array.foldl (fun (d : Nat) (b : Std.DHashMap.Internal.AssocList α β) => d + b.length) 0 buckets
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀ α β = { m : Std.DHashMap.Raw α β // 0 < m.buckets.size }
Instances For
Internal implementation detail of the hash map
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Copies all the entries from buckets into a new hash map with a larger capacity.
Equations
- Std.DHashMap.Internal.Raw₀.expand ⟨data_2, hd⟩ = Std.DHashMap.Internal.Raw₀.expand.go✝ 0 data_2 ⟨Array.replicate (data_2.size * 2) Std.DHashMap.Internal.AssocList.nil, ⋯⟩
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- One or more equations did not get rendered due to their size.
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.get ⟨{ size := size, buckets := buckets }, h⟩ a hma_2 = Std.DHashMap.Internal.AssocList.getCast a buckets[(Std.DHashMap.Internal.mkIdx buckets.size h (hash a)).val] hma_2
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.get! ⟨{ size := size, buckets := buckets }, hm⟩ a = Std.DHashMap.Internal.AssocList.getCast! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size hm (hash a)).val]
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.Const.get! ⟨{ size := size, buckets := buckets }, hm⟩ a = Std.DHashMap.Internal.AssocList.get! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size hm (hash a)).val]
Instances For
Internal implementation detail of the hash map
Equations
- Std.DHashMap.Internal.Raw₀.getKey! ⟨{ size := size, buckets := buckets }, hm⟩ a = Std.DHashMap.Internal.AssocList.getKey! a buckets[(Std.DHashMap.Internal.mkIdx buckets.size hm (hash a)).val]
Instances For
This is the actual well-formedness predicate for hash maps. Users should never need to interact
with this and should use WF instead.
- buckets_hash_self : IsHashSelf m.buckets
Internal implementation detail of the hash map
Internal implementation detail of the hash map
- distinct : Internal.List.DistinctKeys (toListModel m.buckets)
Internal implementation detail of the hash map