In many situations, the data is often split into multiple CSV files and transferred to the DS/ML team for use. As Pandas does not support parallelization, one has to iterate over the list of files and read them one by one for further processing. "Datatable" can provide a quick fix for this. Instead of reading them iteratively with Pandas, you can use Datatable to read a bunch of files. Being parallelized, it provides a significant performance boost as compared to Pandas.
How to Read Multiple CSV Files Efficiently
How to Read Multiple CSV Files Efficiently
How to Read Multiple CSV Files Efficiently
In many situations, the data is often split into multiple CSV files and transferred to the DS/ML team for use. As Pandas does not support parallelization, one has to iterate over the list of files and read them one by one for further processing. "Datatable" can provide a quick fix for this. Instead of reading them iteratively with Pandas, you can use Datatable to read a bunch of files. Being parallelized, it provides a significant performance boost as compared to Pandas.