When You Should Not Use the head() Method In Pandas
One often retrieves the top 𝐤 rows of a sorted Pandas DataFrame by using 𝐡𝐞𝐚𝐝() method. However, there's a flaw in this approach.
If your data has repeated values, 𝐡𝐞𝐚𝐝() will not consider that and just return the first 𝐤 rows.
If you want to consider repeated values, use 𝐧𝐥𝐚𝐫𝐠𝐞𝐬𝐭 (or 𝐧𝐬𝐦𝐚𝐥𝐥𝐞𝐬𝐭) instead. Here, you can specify the desired behavior for duplicate values using the 𝐤𝐞𝐞𝐩 parameter.
Share this post on LinkedIn: link.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.