3 Tools That Slash Data Science Workloads by 60%

Become a Superhuman Data Scientist in the era of AI

Hi! 👋 I am Roxanne, an Ex-IBM Data Scientist and a Coursera Instructor of Machine Learning. Subscribe to my newsletter “Is Data Science Still Sexy” and join the 1000+ readers today!

In my previous article, I talked about 4 essential skills that Data Scientists must have if they don't want to be replaced by future AI automation tools (think of AutoML for example). Now the question is: if AI will inevitably get stronger and stronger in its ability to save time in all kinds of professions, as human Data Scientists can we do better in time-saving too?

As a Data Scientist myself, I searched through the internet and read 10+ articles on time-saving Data Science libraries. I did some “due diligence” for you cross-referencing them (so you don’t have to! 😃), and picked the ones that I know are truly useful based on my past work experience. You will be a “superhuman Data Scientist 😎” if you know how to integrate them into your Data Science routine processes.

🌟 Optuna

Setting up the hyperparameter tuning process is repetitive and time-consuming, and oftentimes you just let it run and have no idea where the process is currently at if you don’t log the output.

Imagine you can have an assistant that can not only set up and optimize the search for you regardless of the models you tune but also keep you posted on the progress with charts. Tell me you don’t want that for your hyperparameter tuning 🤣. Seriously, this is exactly what Optuna will do for you, for free!!

🌟 Pycaret

🔈️“Data Scientists spend 60% of their time on data cleaning.”, sound familiar? It just doesn’t make sense if we don’t have a time-saving library for data cleaning. We’ve invested so much of our time in becoming good Data Scientists, that I’d say there should be more creativity than conformity involved in our work.

Imputing missing values, changing data types, one-hot encoding, removing outliers, feature scaling… Are you not tired of spending hours writing virtually the same code in all your data science projects? TRY PYCARET❗️

Again, it’s open-source (free), so it really wouldn’t hurt if you try to step out of your routine and give it a shot. If it doesn’t fit your appetite, you can still go back to your routine. But if it does, I would be super happy that I helped you achieve one step closer to becoming a superhuman Data Scientist 👏 .

🌟 Gradio

As a Data Scientist, you built your model in a Jupyter notebook, now what? Presenting your model in a Jupyter notebook to a non-technical stakeholder may not seem as pleasing as you think 🙂. A simple but illustrative user interface of your model would be nice 🤖. Can we do that without knowing how to code an application with front-end and back-end frameworks? Absolutely!

Gradio allows you to build a simple user interface (or a machine-learning app) for your model with very few lines of Python code. You just explicitly tell Gradio that your model input/output should be contained in a textbox or an image, for example.

I personally used Gradio many times. I know it’s so useful and convenient that I even built a free guided project around it. Feel free to check it out if you don’t feel like going though Gradio’s official documentation.

That’s a wrap for today! Thanks for reading and subscribing 🫡 . I’ll see you in my next one 👋 .