Part of an ongoing series to introduce the Vital AI software used to make predictions.
Go to beginning: Using the Vital AI software to make predictions
To process data with a machine learning algorithm to build a predictive model, a dataset must be created.
The Twenty Newsgroup source data is comprised of around 20,000 individual text files – one per article.
The Vital AI software uses a standardized data format for datasets, with each data object conforming to the data model.
To convert the source data into the Vital AI data format, we use a simple script.
The key lines of the script are:
... def doc = new TwentyNewsDocument() doc.URI = "http://example.org/twentynews/${newsgroup}/${id}"; doc.title = subject doc.body = body doc.newsGroup = 'http://vital.ai/twentynews/Category/' + newsgroup; writer.startBlock(); writer.writeGraphObject(doc); writer.endBlock(); }
The resulting data file is in the “Vital Block” format, called “block” format as data objects can be grouped together in “blocks” for processing.
Next: Processing datasets using Machine Learning on Hadoop with Vital AI
One thought on “Creating a predictive model training set with Vital AI”