Speed-up Parquet I/O of Pandas by 5x
Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method.
Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.
Find more info here: Docs.
Share this post on LinkedIn: Post Link.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.