top of page

Machine Learning for Newbies

Artificial intelligence (AL) and machine learning (ML) are areas that are popular among people nowadays. Many say that they are the next big thing that’s going to shape the tech industry for the next few years. We see major IT companies competing to make sure they have the best of AI & ML to help their customers. But what’s exactly the difference between these two?


A chart depicting that machine learning is a subset of artificial intelligence
Machine learning is a subset of Artificial intelligence

Artificial intelligence is the ability of machines to imitate human intelligence. Machine learning refers to the algorithms to incorporate intelligence into machines by automatically learning from data. Machine learning is a subset of Artificial intelligence. 


We also hear a lot about Large Language Models (LLM) like GPT 3, GPT 4, Amazon’s Titan, Anthropic Claude, Google’s Gemini and so on . These are part of Artificial intelligence and often are developed using ML based algorithms.


In this article we will be discussing more about how we can use ML to do predictions. Once we understand some of the basics here, it will be easier to understand how more complex models like LLM works.


Types of ML

ML has various types, based on the type of algorithms used at a high level; they can be divided into:

  • Supervised and 

  • Unsupervised. 

Supervised ML has various types like:

  • regression for predicting numbers, 

  • classification for classifying.

Unsupervised has types like:

  • clustering - which LLMs actually use to cluster together words of similar meaning together.

Moving on, you now think you want to use ML for a project. What should you do? 


An ML project involves complexities like:

  • data to train, 

  • computing resources like GPU, storage–all of which costs money and time for training. 

Hence we should be mindful about when to use this route for our project. To determine whether to use ML for your project, think about the following four criterias:

  1. Logic is too complex to be coded

  2. Cannot scale easily

  3. Needs to change based on data

  4. Output needs to be responsive or automated

If one or more of these conditions are met, we can start investing time in ML for our project. Doing your ML project will broadly have the following phases:

  1. Gathering and preparing data

  2. Separating the data into training and testing dataset

  3. Train and create a model

  4. Test your model

  5. Adjust model parameters or datasets based on the output

  6. Could separate the elements in a field into multiple fields,

  7. Increase the lines available in the training dataset,

  8. Test different ML models,

  9. Increase the training time.

  10. Repeat till we get the intended accuracy.

Based on the type of problem we want to solve, ML problems are defined in terms of different modalities. 


Modalities in this context refer to different types or categories of the machine learning problem. Some examples are: 

  • text classification based, 

  • image classification based,

  • time series based, 

  • tabular based. 

All said, training your model for your ML project could quickly get complex; based on the model you want to use for training. Hence, to make things simple, we have a library called Autogluon that simplifies these tasks and gives us one unified interface to train and test with different ML models.


I have a github repository with an example demo written in JupyterLab notebook that you can run in your system to check out: https://github.com/appsec-airito/AutoGluon_Tabular_ML/tree/main 


I currently have a demo–set up and ready for tabular data in the above github repository. I will add the demo code and data for other modalities as well in the coming days.


The next step you can do to get better at this is–practice! Find some sample data. To do this you can either use your own data or collect data from websites like Kaggle, Hugging Face. Subsequently, prepare the data and play around using the code in the above github repository to train your own models. 


All the best!

For some useful cheat sheets to use Autogluon check here.


L

43 views0 comments
bottom of page