In this series of blog posts, I’ll introduce components of the Vital AI software used to make predictions via machine learning models.
We’ll use the venerable “20 Newsgroup” dataset often used in text classification, which consists of around 20,000 text articles across 20 categories. The dataset is available here: https://github.com/vital-ai/vital-datasets/tree/master/20news
The primary steps are:
- Set up a data model
- Create the data set
- Define the prediction model
- Run the machine learning training
- Evaluate the trained model
- Use the model to make ongoing predictions
In this example, our predictions will be the categories assigned to the text – such as a category like “baseball” if the text is about baseball.