40 Open-Source Tools to Supercharge Your Pandas Workflow
Pandas receives over 3M downloads per day. But 99% of its users are not using it to its full potential.
I discovered these open-source gems that will immensely supercharge your Pandas workflow the moment you start using them.
Jupyter-Datatables: Enrich the default preview of a DataFrame in jupyter notebook.
SummaryTools: Supercharge the describe() method in Pandas.
Sidetable: Supercharge the value_counts() method in Pandas.
Sketch: Generate code/insights about data by asking questions in natural language.
Link: https://bit.ly/py-sketch
Deepchecks: Generate a comprehensive validation report of your data.
Link: https://bit.ly/deepchks
Pandas Flavor: Extend Pandas to attach methods to the dataframe object.
Link: https://bit.ly/pd-flavor
Pandarallel: Parallelize Pandas across multiple CPU cores.
PandasML: Pandas, sklearn and matplotlib integrated.
Link: https://bit.ly/pandasml
Geopandas: Work with Geospatial data in Pandas.
Link: https://bit.ly/geo-pd
DuckDB: Run SQL queries on dataframes.
Link: https://bit.ly/duckdb
Modin: Boost Pandas' performance up to 70x by modifying the import.
PivotTableJS: Create pivot tables by using drag and drop functionality.
Missingno: Visualize missing values in your dataset.
Pandas Alive: Create animated charts for pandas dataframes.
Link: https://bit.ly/pd-alive
Skimpy: Supercharge the describe() method in Pandas.
Link: https://bit.ly/py-skimpy
Pandas-log: Debug pandas pipeline using step-by-step logging.
Link: https://bit.ly/py-log
tsflex: Process time series and perform feature extraction.
Link: https://bit.ly/tsflex
pandas-profiling: Generate EDA report of data in one-line of code.
Mars: A tensor-based framework for scaling numpy, pandas, scikit-learn, and Python functions.
Link: https://bit.ly/py-mars
nptyping: Apply type hints for Pandas data frames.
Link: https://bit.ly/nptyping
popmon: Profile your data to determine its stability.
Link: https://bit.ly/py-popmon
Gspread-pandas: Interact with Google sheets through pandas dataframes.
pdpipe: Create pandas pipeline easily and intuitively.
Link: https://bit.ly/py-pdpipe
PrettyPandas: Prettify the dataframe when printed.
Dora: An intuitive API for data cleaning, processing, feature selection, visualization, etc.
Link: https://bit.ly/py-dora
Pandapy: The speed of NumPy combined with Pandas' elegance.
Link: https://bit.ly/pandapy
PyJanitor: A clean API for cleaning data.
Link: https://bit.ly/pyjanitor
swifter: Speed-up the apply() method in Pandas.
Mito: Analyze data in Jupyter by editing a spreadsheet.
Link: https://bit.ly/mito-ds
Visual Python: GUI-based Python code generator for data science
Link: https://bit.ly/visual-py
tqdm: Add progress bars to Pandas methods.
Link: https://bit.ly/tqdm-pd
Lux: Automatic data visualization.
Link: https://bit.ly/pd-lux
D-Tale: Visualizer for pandas dataframe.
Link: https://bit.ly/py-dtale
AutoClean: Automated data preprocessing & cleaning.
pytablewriter: Write a dataframe in various formats: AsciiDoc / CSV / HTML / JSON / LaTeX / Markdown / Excel / TOML / TSV / YAML, etc.
itables: Pandas dataframes as interactive datatables.
Link: https://bit.ly/itables
PandasGUI: A GUI for Pandas dataframes.
Link: https://bit.ly/PandasGUI
tabula-py: Extract table from PDF into Pandas dataframe.
Link: https://bit.ly/tabulapy
Pingouin: Perform statistical testing on Pandas dataframe.
Dexplot: Create many types of beautiful data visualizations with a simple, consistent, and intuitive API.
Link: https://bit.ly/dexplot
That’s a wrap!!
What cool Python libraries would you add to this list?
👇 Drop your suggestions in the replies below 👇
Share this post on LinkedIn: Post Link.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.