miercuri, 23 noiembrie 2022

Automated Machine Learning and the Future of Data Science

 

Automated Machine Learning tools

What is Automated Machine Learning?




Most of you probably already know what AutoML is, or at least heard of it. Automated Machine Learning is a tool built by Google to automate the full machine learning pipeline. But Google is not the only one to have developed such a tool, Microsoft, Oracle and Amazon, all have their own implementations in the cloud.

Automated Machine Learning tools are build to perform a deep and wide search over a vast number of models and hyperparameters to find the best model and the best feature engineering for the problem. Besides the automation of a large part of Machine Learning projects, it is pretty easy to get started, being somewhat user-friendly, especially for people with software or cloud experience who want to run their models at scale.

Due to this and a plethora of low-code/no-code tools are taking the industry by storm. There are more than enough articles claiming that Automated Machine Learning will replace Data Scientists. But it might not be so, at least for the time being.


What Does Automated Machine Learning Do?

AutoML automates the entire machine learning workflow.

Automated Machine Learning is preferred to be done in cloud infrastructures like Google Cloud, Azure Machine Learning or Amazon SageMaker. It's purpose is to replace all the manual parts of tuning and model experimentation done by today's Data Scientists through searching for optimal hyperparameters and models for modeling tasks. Even the iterative piece is handled by AutoML, it's main function is to optimize evaluation metric so they will keep iterating until the best results are achieved.


Once the model has been trained, it can be easily deployed into a Production instance on the cloud, where model monitoring checks are set up to review such as Precision-Recall curves, Feature Importance and more.

The cloud platforms also have specific Deep Learning products with AutoML for use cases such as Vision (Object Detection), NLP, Time Series (Forecasting), and many more.


The future of Data Science and of Data Scientists

By the look of things, it seems that domain knowledge will rule the future for Data Scientists. Understanding the relationship between inputs and outputs in human-interpretable ways, having the skills to communicate this knowledge is the most important input to predictive modeling. In order to build better model input, the business problem that the model is being applied to, must be understood by specialists. 

It is believed that the Data Scientist of the future will spend most of his time designing experiments, confirming hypothesis, embedding themselves close to the business, and writing SQL to build features to improve model accuracy.

So, in the end, I believe that the Data Scientist will not be replaced but the profession itself will undergo some major changes in that which constitutes it's role, since AutoML cannot handle the interpretation of models to business leaders, together with recommendations of what actions to take and so on. 


Bibliography

  • https://aliz.ai/en/blog/automl-an-introduction-to-get-you-started/
  • https://medium.com/analytics-vidhya/an-introduction-to-automl-8356b6ceb091
  • https://docs.google.com/presentation/d/1dp9-F3lGInr_H8sTFEOVKBNCPGkplk7bb99kvpkJdmQ/htmlpresent
  • https://fortune.com/education/business/articles/2022/05/26/the-value-of-a-data-science-degree-as-told-by-microsofts-chief-data-scientist/

Niciun comentariu:

Trimiteți un comentariu

Disease Symptom Prediction

Introduction: Machine learning is programming computers to optimize a performance using example data or past data. The development and e...