Creating a predictive model training set with Vital AI

Part of an ongoing series to introduce the Vital AI software used to make predictions.

Go to beginning: Using the Vital AI software to make predictions

To process data with a machine learning algorithm to build a predictive model, a dataset must be created.

The Twenty Newsgroup source data is comprised of around 20,000 individual text files – one per article.

The Vital AI software uses a standardized data format for datasets, with each data object conforming to the data model.

To convert the source data into the Vital AI data format, we use a simple script.

The key lines of the script are:

...

def doc = new TwentyNewsDocument()

doc.URI = "http://example.org/twentynews/${newsgroup}/${id}";

doc.title = subject

doc.body = body

doc.newsGroup = 'http://vital.ai/twentynews/Category/' + newsgroup;

writer.startBlock();

writer.writeGraphObject(doc);

writer.endBlock();

}

The resulting data file is in the “Vital Block” format, called “block” format as data objects can be grouped together in “blocks” for processing.

Next: Processing datasets using Machine Learning on Hadoop with Vital AI

One thought on “Creating a predictive model training set with Vital AI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s