When You Should Not Use the head() Method In Pandas

Dec 17, 2022

One often retrieves the top 𝐤 rows of a sorted Pandas DataFrame by using 𝐡𝐞𝐚𝐝() method. However, there's a flaw in this approach.

If your data has repeated values, 𝐡𝐞𝐚𝐝() will not consider that and just return the first 𝐤 rows.

If you want to consider repeated values, use 𝐧𝐥𝐚𝐫𝐠𝐞𝐬𝐭 (or 𝐧𝐬𝐦𝐚𝐥𝐥𝐞𝐬𝐭) instead. Here, you can specify the desired behavior for duplicate values using the 𝐤𝐞𝐞𝐩 parameter.

Share this post on LinkedIn: link.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.

Daily Dose of Data Science

Discussion about this post