Creating a predictive model training set with Vital AI

Part of an ongoing series to introduce the Vital AI software used to make predictions.

Go to beginning: Using the Vital AI software to make predictions

To process data with a machine learning algorithm to build a predictive model, a dataset must be created.

The Twenty Newsgroup source data is comprised of around 20,000 individual text files – one per article.

The Vital AI software uses a standardized data format for datasets, with each data object conforming to the data model.

To convert the source data into the Vital AI data format, we use a simple script.

The key lines of the script are:


def doc = new TwentyNewsDocument()

doc.URI = "${newsgroup}/${id}";

doc.title = subject

doc.body = body

doc.newsGroup = '' + newsgroup;





The resulting data file is in the “Vital Block” format, called “block” format as data objects can be grouped together in “blocks” for processing.

Next: Processing datasets using Machine Learning on Hadoop with Vital AI

One thought on “Creating a predictive model training set with Vital AI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s