Data no longer exclusively resides in on-premises data centres, with users gaining insight via a data extract from a relational database.
There is a data explosion across multi-cloud, on-premises and Software-as-a-Service applications. In addition, analytics is evolving from reactive processes to empowering users to make effective decisions based on data-driven predictions. When will a component fail? What will your customer purchase next, or will you pay back a loan?
Do you have aspirations to grow your business with predictive analytics? Machine learning involves using historical information to make predictions about the future. Here are three steps to get started:
It’s essential to clarify your scope before starting:
Machine learning expands into all areas of our lives. Do you want to predict a numeric value, such as journey time, fund value, insurance payout or profit? These projects are known as regression experiments.
Another type of machine learning is classification, where you want to predict from a discrete list. Examples are exam results (A, B, C etc.), sporting fixture results (win, draw, loss) or political affiliation. A subset of classification is a binary classification where you have two options, true or false. Will someone be affected by a disease, is a transaction fraudulent or is email spam?
Is machine learning integrated into your website or app, included in a Power BI dataset or maybe you will batch-predict based on spreadsheet data. Who are the consumers of your predictions, and are they customer-facing?
If you predicted correctly 100% of the time, there would be some poor bookmakers out there. What is your definition of success? Is it to be more accurate than a random guess, 75% accuracy or something else?
What data is at your disposal to test your experiments? Who owns the data, how will you authenticate and is the quality sufficient to support your needs? A mature data estate includes lineage and catalog tooling to clarify the data’s origins, journey and transformations.
A supervised machine learning experiment is one where the attribute you want to predict exists within the dataset. This attribute is known as the “Label”, which you use to verify your model’s accuracy before making predictions using new incoming data. “Features” are the attributes in your dataset you use to predict the Label.
You have clarified your project parameters, so now you can get started.
Start by profiling your data of interest. For each attribute, establish the count of empty values and cardinality (uniqueness). Additionally, for numeric features, what is the min, max, mean and standard deviation. The goal here is to establish the features you will use to make predictions. Attributes with many empty values, very low or very high cardinality, are probably not good candidates.
I recommend getting started by using the Automated ML feature. Automated ML allows you to specify a dataset and configuration settings to create a machine learning experiment quickly. Microsoft provides a guided demo at https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-models.
As you become more proficient with Azure ML, you can bypass the automated feature and create experiments using drag-and-drop graphical pipelines or code within notebooks. Some techniques provided by Azure ML you will use are:
Don’t be intimidated by the algorithms or terminology if you are not familiar with those names. You can use the algorithms without a masters degree in mathematics or statistics. Your task is to interpret the results and try different combinations until you have reached your definition of success.
An example pipeline to predict the price of cars using the graphical drag-and-drop method:
It is essential to bring an ethical approach to machine learning:
Those are items for consideration. Also, ensure you implement security and compliance best practices following the zero-trust model.
Now you’re ready to get started with Azure Machine Learning. Navigate to https://aka.ms/azureml and experiment for free following the tutorials at https://docs.microsoft.com/en-us/azure/machine-learning/.
A great place to start is to use the Automated ML feature with a pilot project following the three Es above. Good luck!