Many Pandas users use the dataframe subsetting terminologies incorrectly. So let's spend a minute to get it straight.
𝐒𝐔𝐁𝐒𝐄𝐓𝐓𝐈𝐍𝐆 means extracting value(s) from a dataframe. This can be done in four ways:
1) We call it 𝐒𝐄𝐋𝐄𝐂𝐓𝐈𝐍𝐆 when we extract one or more of its 𝐂𝐎𝐋𝐔𝐌𝐍𝐒 based on index location or name. The output contains some columns and all rows.
2) We call it 𝐒𝐋𝐈𝐂𝐈𝐍𝐆 when we extract one or more of its 𝐑𝐎𝐖𝐒 based on index location or name. The output contains some rows and all columns.
3) We call it 𝐈𝐍𝐃𝐄𝐗𝐈𝐍𝐆 when we extract both 𝐑𝐎𝐖𝐒 and 𝐂𝐎𝐋𝐔𝐌𝐍𝐒 based on index location or name.
4) We call it 𝐅𝐈𝐋𝐓𝐄𝐑𝐈𝐍𝐆 when we extract 𝐑𝐎𝐖𝐒 and 𝐂𝐎𝐋𝐔𝐌𝐍𝐒 based on conditions.
Of course, there are many other ways you can perform these four operations.
Here’s a comprehensive Pandas guide I prepared once: Pandas Map. Please refer to the “DF Subset” branch to read about various subsetting methods :)
👉 Tell me you liked this post by leaving a heart react 🤍.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
This is some content that we don't see everywhere. Excellent text!