You Were (Most Probably) Given Incomplete Info About How Python Dictionaries Work
Understanding the lesser-talked internal workings of a Python dictionary.
While Python is probably one of the easiest programming languages to begin with, there are MANY things that make it quite weird and counterintuitive at times.
Today, I want to share one peculiarity about Python dictionaries, which most Python programmers aren’t aware of.
Let’s begin!
Consider we declare the following Python dictionary:
We add four keys during this dictionary declaration:
Integer type 1
Float type 1.0
Boolean type True
String type ‘1’
These four keys are four different objects, which we can also verify by printing their respective object IDs using the id()
method:
Thus, one might expect that the final dictionary must have four keys.
However, when we print the above dictionary, we notice that there are only two keys:
Where did the other two keys go?
What are we missing here?
Internal working of Python dictionary
Most Python programmers believe that dictionaries find/insert/delete a key based on the ID of the object being added as a key.
But this is not true.
Instead, dictionaries do this using hash equivalence — one that is computed using the hash()
method.
Thus, if two objects have the same hash value, then a dictionary will consider them as identical keys.
This is precisely what we notice with ‘Integer 1
’, ‘Float 1.0
’, and ‘Boolean True
’ objects.
As depicted below, these three objects have the same hash value:
As a result, a Python dictionary considers them as identical keys, even though they are entirely different objects.
One thing to notice here is that the final dictionary maintains the ‘Integer 1
’ key, but its value is the one corresponding to ‘Boolean True
’ key.
This happens because of the order in which we specified the keys during dictionary declaration:
Let’s break down the steps:
When the dictionary object is created, the dictionary is empty. From here on, the keys will be added one by one.
First, the ‘
Integer 1
’ is added, and the dictionary becomes this:
Next, while adding ‘
Float 1.0
’, Python finds its hash equivalence with the existing key of ‘Integer 1
’. Thus, the existing key (‘Integer 1
’) is maintained, but its value is updated:
Moving on, while adding ‘
Boolean True
’, another hash equivalence takes place with the existing key of ‘Integer 1
’. Yet again, the existing key (‘Integer 1
’) is maintained by its value is updated:
Finally, while adding ‘
String 1
’, no hash equivalence takes place, and a new key is appended, giving us the following the dictionary:
We can validate this reasoning by changing the order in which we specified the keys during dictionary declaration:
This time, we get a different output, but it is coherent with the reasoning above.
Takeaway
Contrary to common belief, Python dictionaries never find/insert/delete keys based on object identities. Instead, they use hash equivalence.
Check out these newsletter issues next to learn about some more Python peculiarities or misinterpreted technicalities:
A Nasty Feature of Python That Many Programmers Aren't Aware Of.
"How" Python Prevents Us from Adding a List as a Dictionary's Key?
A For-loop and List Comprehension Are Fundamentally Different at Scope Level.
👉 Over to you: What are some other Python technical concepts that most programmers misinterpret?
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.
The button is located towards the bottom of this email.
Thanks for reading!
Latest full articles
If you’re not a full subscriber, here’s what you missed last month:
DBSCAN++: The Faster and Scalable Alternative to DBSCAN Clustering
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning
You Cannot Build Large Data Projects Until You Learn Data Version Control!
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.
To receive all full articles and support the Daily Dose of Data Science, consider subscribing:
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!
"Thus, if two objects have the same hash value, then a dictionary will consider them as identical keys."
If they are equal (like 1==True, 1.0 == 1). If they are not equal and have the same hash that is just a **hash collision**, and all the objects are stored, see: https://stackoverflow.com/questions/9010222/why-can-a-python-dict-have-multiple-keys-with-the-same-hash
I got to start my day well with this :)
Thank you