Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method. Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.
Speed-up Parquet I/O of Pandas by 5x
Speed-up Parquet I/O of Pandas by 5x
Speed-up Parquet I/O of Pandas by 5x
Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method. Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.