joi, 24 noiembrie 2022

ML enhancing Cloud Security via Microsoft Sentinel

 As technology advances, cloud solutions are becoming more and more common, and with that the need for better solutions for cloud cybersecurity is also increasing. In this aspect, machine learning has helped by reducing the costs and the response time required for managing these threats.

A prime example of this would be Microsoft Sentinel. Sentinel is a scalable, cloud-native solution that delivers intelligent security analytics and threat intelligence across the enterprise. With Microsoft Sentinel, you get a single solution for attack detection, threat visibility, proactive hunting, and threat response.


Sentinel comes with a set of functionalities such as an easy way to gather the data across the enterprise. You can collect the data at cloud scale, across all users and combine it with security information in order to find threats. Almost all the solutions provided by Microsoft are connected via data connectors that are used for real-time integration.


Sentinel uses highly scalable machine learning algorithms to detect threats and reduce the number of false positives. At Ignite 2021, Microsoft introduced Fusion analytics which, based on ML, constantly learn from past attacks, apply analysis and find threats that wouldn’t be possible to find automatically. Besides that, there have been UEBA (User and Event Behavior Analytics) models to better identify threats based on behavioral anomalies. 


To understand the scope of an attack, Sentinel is using AI-based investigation and also provides a way to automate it by building hunting queries and Azure Notebooks. The built-in queries are one of the best features that Microsoft Sentinel provides because it allows you to “hunt” for security threats across your data, even before an alert is triggered.


Talking about automation, you can build playbooks, a cloud service that helps you schedule, automate, and orchestrate tasks and workflows across systems throughout the enterprise, or use the ones that are already provided by Microsoft to solve repetitive tasks and to respond to threats quickly.



Sources: - https://learn.microsoft.com/en-us/azure/sentinel/

                - https://techcommunity.microsoft.com/t5/microsoft-sentinel-blog/microsoft-sentinel-introduces-enhancements-in-machine-learning/ba-p/2897871

miercuri, 23 noiembrie 2022

Automated Machine Learning and the Future of Data Science

 

Automated Machine Learning tools

What is Automated Machine Learning?




Most of you probably already know what AutoML is, or at least heard of it. Automated Machine Learning is a tool built by Google to automate the full machine learning pipeline. But Google is not the only one to have developed such a tool, Microsoft, Oracle and Amazon, all have their own implementations in the cloud.

Automated Machine Learning tools are build to perform a deep and wide search over a vast number of models and hyperparameters to find the best model and the best feature engineering for the problem. Besides the automation of a large part of Machine Learning projects, it is pretty easy to get started, being somewhat user-friendly, especially for people with software or cloud experience who want to run their models at scale.

Due to this and a plethora of low-code/no-code tools are taking the industry by storm. There are more than enough articles claiming that Automated Machine Learning will replace Data Scientists. But it might not be so, at least for the time being.


What Does Automated Machine Learning Do?

AutoML automates the entire machine learning workflow.

Automated Machine Learning is preferred to be done in cloud infrastructures like Google Cloud, Azure Machine Learning or Amazon SageMaker. It's purpose is to replace all the manual parts of tuning and model experimentation done by today's Data Scientists through searching for optimal hyperparameters and models for modeling tasks. Even the iterative piece is handled by AutoML, it's main function is to optimize evaluation metric so they will keep iterating until the best results are achieved.


Once the model has been trained, it can be easily deployed into a Production instance on the cloud, where model monitoring checks are set up to review such as Precision-Recall curves, Feature Importance and more.

The cloud platforms also have specific Deep Learning products with AutoML for use cases such as Vision (Object Detection), NLP, Time Series (Forecasting), and many more.


The future of Data Science and of Data Scientists

By the look of things, it seems that domain knowledge will rule the future for Data Scientists. Understanding the relationship between inputs and outputs in human-interpretable ways, having the skills to communicate this knowledge is the most important input to predictive modeling. In order to build better model input, the business problem that the model is being applied to, must be understood by specialists. 

It is believed that the Data Scientist of the future will spend most of his time designing experiments, confirming hypothesis, embedding themselves close to the business, and writing SQL to build features to improve model accuracy.

So, in the end, I believe that the Data Scientist will not be replaced but the profession itself will undergo some major changes in that which constitutes it's role, since AutoML cannot handle the interpretation of models to business leaders, together with recommendations of what actions to take and so on. 


Bibliography

  • https://aliz.ai/en/blog/automl-an-introduction-to-get-you-started/
  • https://medium.com/analytics-vidhya/an-introduction-to-automl-8356b6ceb091
  • https://docs.google.com/presentation/d/1dp9-F3lGInr_H8sTFEOVKBNCPGkplk7bb99kvpkJdmQ/htmlpresent
  • https://fortune.com/education/business/articles/2022/05/26/the-value-of-a-data-science-degree-as-told-by-microsofts-chief-data-scientist/

marți, 22 noiembrie 2022

Digital olfaction

 

Image generated using Dall-E 2 ("artificial intelligence smelling a flower")


In the world of computers, the ability to smell has been made possible through artificial intelligence. This may sound like a futuristic concept, but it’s actually being used today in a variety of industries, from food production to healthcare.

Digital “noses” work by detecting and identifying chemicals in the air. This information is then analyzed by algorithms that have been specifically designed to interpret the data. The use of artificial intelligence in this way has a number of advantages. Machines don’t get tired and they can be programmed to ignore certain smells that might otherwise interfere with their ability to identify other, more important smells, making them much more accurate than the human nose. What’s more, digital noses can be used to detect very faint smells that would be undetectable to the human nose.

Digital olfaction can be extremely helpful in a number of different industries. For example, digital noses are being used in the food industry to detect early signs of food spoilage.
Aryballe is a leading provider of digital olfaction technology. Aryballe has developed a range of sensors that can be used to detect a wide variety of odors, including those associated with spoilage. This technology is helping food companies to improve food safety and quality control.

One of the most promising applications of digital olfaction is in the early detection of cancer. Cancerous cells produce unique volatile organic compounds (VOCs) that can be detected by sensors. In one study, digital olfaction was able to correctly identify the VOCs associated with lung cancer with high accuracy.

Digital olfaction has the potential to revolutionise the way we diagnose and treat patients with Parkinson’s disease (PD). Currently, the gold standard for diagnosing PD is through clinical assessment, which can be unreliable. There is no definitive diagnostic test for PD, and the current methods for diagnosis are often invasive, expensive, and/or require specialised equipment. Digital olfaction has the potential to improve the accuracy of PD diagnosis and to provide a more convenient and less invasive way to test for PD. This technology could also be used to monitor PD progression and to evaluate the efficacy of PD treatments.

This technology is still in its early stages, but it shows a lot of promise. In the future, it’s likely that digital noses will become even more widespread and play an important role in a variety of different industries.

Bibliography:




joi, 17 noiembrie 2022

AI Generated Prize Winning Picture

    A few weeks ago, there was a heated debate regarding the implications of AI in the process of art making. Jason M. Allen used one such generated piece of art to register into the digital art category of the Fine Arts competition, at Colorado State Fair, where he also won the first prize. This raised several complaints from other artists, some of them eventually accusing him of cheating, yet Jason M. Allen did not break any of the rules as the entry was registered with the title “Jason M. Allen via Midjourney”, and therefore he kept the prize.

    In simple terms, the application works by entering some  words in a textbox, describing what the artist would like to create, and then the AI that was trained on a dataset of other artist's works (both copyrighted and not) would produce a set of images.

    The image that won the Fine Arts competition is the one presented above, and has been generated using Midjourney. Midjourney is one of the best tools available and at the moment any person can try it on their Discord server. Here is a link presenting some of the best work created by the community: https://www.midjourney.com/showcase/.

    One of the biggest breakthroughs in A.I. generated art happened in 2015, when Alexander Mordvintsev (a Google engineer) created DeepDream, a program that uses neural networks to create an image similar with what one would see when ingesting LSD. The following image represents the LSD-like experience:



    Other memorable mentions of such applications would be OpenAI’s DALL-E-2 and Stable Diffusion, which similarly can create realistic images and art from a description presented in natural language.

    

    Regarding the moral implications of these great algorithms, do you think this represents the end of visual artists? We definitely think not, as it would rather be a super powerful tool that allows artists to express their creativity, and to brainstorm ideas that can then bloom into even more impressive pieces of art.


    Bibliography:

joi, 10 noiembrie 2022


Body fat predictions


                 Introduction

                    Machine learning is now used to create new measurements that help corelate 
body fat with cardiometabolic diseases.
                    The dataset used for the development of the convolutional neural networks (CNNs)
consists of MRI imaging data collected from 40,032 participants from UK.
                    MRI or magnetic resonance imaging is a noninvasive way to examine organs, tissues
and the skeletal system of a person. In short terms the procedure produces high-resolution
images of the inside of the body and this allows doctors to have a clearer view of
what is happening.
                    In order to train the CNNs, the specialists have split the data in two separated parts.
The first part which was used for training consisted in 9,041 samples and the
second part of the dataset (30,991) was used for testing.


                    The samples used for training were already quantified in different measurements, for
example:
            - visceral adipose tissue (VAT)
            - abdominal subcutaneous adipose tissue (ASAT)
            - gluteofemoral adipose tissue (GFAT)
                    After training, the CNNs were used to quantify the remaining data from the other participants.By doing this the scientists were able to derive new metrics which were fully independent
of BMI (Body Mass Index).
                    The new metrics are called:
            - VAT adjusted for BMI (VATadjBMI)
            - ASAT adjusted for BMI (ASATadjBMI)
            - GFAT adjusted for BMI (GFATadjBMI)


                    The results of the CNNs showed a near-perfect estimation for VAT, ASAT and GFAT.
                    By taking the presence of type 2 diabetes and associating it with the new metrics, 
the following results were obtained:
            - VATadjBMI showed a significantly increased risk with a OR/SD (odds ratio per standard deviation increase) of 1.49 and 95% CI (confidance interval)
            - ASATadjBMI was largely neutral with 1.08 OR/SD and 95% CI
            - GFATadjBMI conferred protection with 0.75 OR/SD and 95% CI





                    Bibliography

https://www.medrxiv.org/content/10.1101/2021.05.07.21256854v2

 

Application of Machine Learning 

in Electromagnetics


                Abstract

                 As an integral part of the electromagnetic system, antennas are becoming more advanced and versatile than ever before, thus making it necessary to adopt new techniques to enhance their performance. Machine Learning (ML), a branch of artificial intelligence, which is a method of data analysis that automates analytical model building with minimal human intervention. The potential of solving unpredictable and non-linear complex challenges is attracting researchers in the field of electromagnetics, especially in antenna and antenna-based systems. Although the accuracy of Machine Learning algorithms depends on the availability of sufficient data and expert handling of the model, it is steadily becoming the desired solution when the aim is for a cost-efficient and without excessive time consumption solution. 

                In this project we will be presenting an overview of machine learning and its applications in electromagnetics. Moreover, we will discuss what types of antennas can be used in different places based on the electromagnetic reading from that area, by using intelligent algorithms for the antenna design.


                 Introduction

                In Timis county, a multitude of measurements related to electromagnetic fields were performed. The measurements took place in all the big cities and in the surrounding villages and communes. The main points of interest were hospitals, schools, churches, GSM antennas and other public buildings and large intersections. The factors that could influence the results are the temperature, the weather and the time when the measurements took place, based on the number of dwellings present at these locations and the devices used.

                          Figure 1. Applications of Machine Learning in the field of electromagnetics.

                    The objective of the project covers the "Antenna positioning and direction estimation" based on the measurements noted from different parts of the city. The places where high concentrations of electromagnetic fields are detected, can be used to determine whether it is an appropriate place to install an antenna and it can also be used to determine if in the direction in which the antenna is pointed has disruption points.

                    How can Machine Learning help in electromagnetics? Electromagnetics in today's world is everywhere, in every device that consumes electricity. If a current passes through a wire, it will generate an electromagnetic field. Considering this, we want to generate electromagnetic fields when we want to communicate over Wi-Fi, Bluetooth or other types of wireless transmissions. This type of electromagnetic Wi-Fi communication has a drawback, it is prone to interferences. A good example would be if we have a lot of electronics in a place where we want to use our mobile phone to connect to a Wi-Fi network. In this case, the data can be corrupted, leading to loss of connection or other unintended behavior. Another problem is how can we separate multiple mobile phones by interfering with each other.

                    Machine learning can help by learning the interference pattern and canceling out those patterns. In this case, the error rate can be reduced by as much as 75-90%, yielding to a better data transmission and less error correcting work. By using different kind of models, we can adapt our antenna layout in ways that can optimize communication in a tight area, if we want a focused communication, like e a beam.

                    By using machine learning we can create a model to work more reliably in harsh condition, like rain, snow, or other interfering factors. Another way we can optimize the electromagnetic situation by using learning algorithms is by using less expensive antennas, because we can compensate with data predictions leading to a cost reduction solution.


Bibliography:

- https://www.mdpi.com/2079-9292/10/22/2752/pdf

joi, 3 noiembrie 2022

House Price Prediction competition on Kaggle

 

House Price Prediction

 


 

 

 


Abstract:

The House Price Index (HPI) is a popular tool for estimating changes in housing prices. Because housing prices are strongly correlated with other factors like location, area, and population, predicting individual housing prices requires information other than the HPI. There have been a significant number of papers using conventional methods of machine learning to correctly predict housing prices, however they rarely concern themselves with the performance of specific models and ignore the less common and still complex models. As a consequence, in order to investigate the differences between several advanced models, this paper will use both traditional and modern machine learning strategies to look into the various effects of features on prediction techniques.

Short introduction:

We decided to enter Kaggle's advanced regression methods competition, taking you along for the ride. If you're new to machine learning and would like to see a project from start to finish, try sticking around. We'll go over the steps we've taken while also attempting to deliver a basic course in machine learning.

 

 

Objective:

The goal of the competition is to forecast home sales prices in Timisoara. A training and evaluating data set in csv format, along with a data dictionary, are provided.

Training: We have many examples of houses with many features that describe each facet of the house through our training data. Each house's sale price (label) is given to us. The training data will be used to "teach" our models.

Testing: The sample data set contains the same number of capabilities as the training data. Because we are attempting to predict the sale price, we exclude it from our test data set. After we have constructed our models, we would then run the best one on the test data and publish it to the Kaggle leaderboard.

 

 

Task:

Machine learning tasks are typically classified into three types: supervised, unsupervised, as well as reinforcement. Our task for this competition would be supervised learning.

the type of machine learning task presented to you is easy to identify based on the data you have and your goal. We've been provided housing data with features and labels, and we're supposed to predict the labels for houses that aren't in our training set.

 

Tools:

For this project, we used Python as well as Jupyter notebooks. Jupyter notebooks are popular among data scientists because they are simple to use and demonstrate your work steps.

 

In general, most machine learning projects follow the same procedure. Data ingestion, data cleaning, exploratory data analysis, feature engineering, and finally machine learning are all steps in the process. Because the pipeline is not linear, you may have to switch back and forth between stages. It's important to note this because tutorials frequently lead you to believe that the procedure is much cleaner than it actually is. Bear this in mind because your first machine learning project may be a disaster.

 

 

Data cleaning:

First we make the difference between null values as missing values or as a meaning. Using the mode to fill the categorical variable

Impute using a constant value and Impute using the column mode:



we will start cleaning the numerical data by filling missing values, using knn amputation

 



 

 

 

 

take column names of the numeric features then see skew for each column and log transform for skewed features

 



 

 

Output:



Exploratory Data Analysis (EDA):

This is frequently where our data visualization journey begins. EDA throughout machine learning is used to investigate the quality of our data. Labels: I used a histogram to plot the sales price. The allocation of sale prices is skewed, which is to be expected. It is not uncommon to see a few reasonably priced houses in ones neighborhood.

 

Input:



 

Output:



Conclusion:

A.I it's the most effective way to improve and predict costs with high precision. The data is from Mumbai and the method is the decision tree. The Decision tree regressor gives an accuracy of 89%.

The house price prediction helps sellers (that own a property or build one) and buyers to put a fair price on the house. The variables that change houses prices can be: number of rooms, age of the property, postal region and so on. In this paper, 2 more variables are added: air quality and noise pollution.

The system design and architecture of this article contains: collecting the data from different websites, data processing for cleaning the file, having two modules: training set and test set, testing and integrating with UI in the end.

Between Multiple straight backslide, Decision Tree Regressor, and KNN, the Decision Tree Regressor fitted their dataset the best. The decision tree regressor recognizes quality components and trains a model like a tree to forecast data in the future to provide a massive result. Right after building the model and giving the result, the accompanying stage is to do the consolidation with the UI

The implementation consists in data processing, factual representation of dataset, visualization by using Matplotlib and fitting the model by using the regressor.

In the end, they displayed the design of anticipated versus genuine costs with the precision of expectation.

the Decision tree AI estimate is utilized in this work to develop an assumption model for predicting implicit selling costs for any land property. Fresh parameters like air quality and wrongdoing rate were linked to the dataset to aid in predicting expenses even more accurately.

 

https://kalaharijournals.com/resources/APRIL_15.pdf

 

 

References

[1] Burkov, A (2019). The Hundred Page Machine Learning Book, pp.84–85

marți, 1 noiembrie 2022

Machine learning in trading

             Machine learning is to trading what fire was to the cavemen. That’s how one industry player described the impact of a disruptive technology on a staid industry. AI trading companies use various tools in the AI wheelhouse — machine learning and algorithmic predictions, for example — allowing brokers to customize exchanges and secure stocks. One benefit of AI stock trading is that it can be executed on ordinary networks and PCs. 

When Wall Street statisticians realized they could apply machine learning to many aspects of finance, including investment trading applications, Anthony Antenucci, vice president of global business development at Intelenet Global Services, had insight to share. “They could effectively crunch millions upon millions of data points in real time and capture information that current statistical models couldn’t,” he told ITPro Today. “Machine learning is evolving at an even quicker pace and financial institutions are one of the first adaptors.”

Exemples:

Through its 2017 acquisition of 'Neurensic', 'Trading Technologies' has an AI platform that identifies complex trading patterns on a massive scale across multiple markets in real time. Combining machine learning technology with high-speed, big data processing power, the company provides clients with the ability to build their own algorithm trading platforms. This allows users to automate the entry and exit of positions and reduce the market impact of large orders as well the risk of manual errors.

'Numerai' uses machine learning to predict stock market trends and manage a new kind of hedge fund. The firm is a unique player in the market, as it uses encrypted data sets to crowdsource stock market models predicted by AI. The models are sourced from anonymous data scientists who are awarded Numerai’s cryptocurrency, NMR, for providing better models.

'IntoTheBlock' uses AI and deep learning to power its price predictions for a variety of crypto markets. IntoTheBlock’s models are trained on spot, blockchain and derivatives datasets and allow users to access historical data to better inform their trade decisions. 

Overnight, 'Trade Ideas' AI-powered self-learning, robo-trading platform “Holly” subjects dozens of investment algorithms to more than a million different trading scenarios to increase the alpha probability in future sessions. Each night the AI assistant platform will select the strategies with the highest statistical chance to deliver profitable trades for the upcoming trading day. On average, Holly enters between 5 and 25 trades per day based on various strategies.

'Sentieo' provides a host of financial solutions with the help of AI. The company’s AI-powered financial search engine collects internal and external content into a single shared workspace. Analysts can use its natural language processing to identify the latest news on key financial searches, while individual investors can use its platform to research companies and markets.

In conclusion, machine learning can help us improve our trading game, but from the studies researched, AI, let alone still can’t do better than a human.


Sources: https://theconversation.com/humans-v-ai-heres-whos-better-at-making-money-in-financial-markets-174937

https://builtin.com/artificial-intelligence/ai-trading-stock-market-tech


Disease Symptom Prediction

Introduction: Machine learning is programming computers to optimize a performance using example data or past data. The development and e...