Part of a series to introduce the Vital AI software used to make predictions.
Go to beginning: Using the Vital AI software to make predictions
At the heart of any data-driven application is a data model – but often the data model is never is fully captured. It is spread out over many schema files, databases, source code files, and the minds of the developers working on the application.
By capturing the data model in one place:
- Developers can easily reference it across different software components
- Developers have a single place to look for data definitions
- Code can be generated directly from the data model
- Errors can be detected much more easily by checking data against the model
Typical schema formats are very limited in what can be specified – to truly capture the full model for Big Data applications a much richer format must be used.
At Vital AI, we use the OWL standard to describe Big Data Models. OWL is a standard used to create data models – also known as “ontologies.”
Some background on the standard is here:
And documentation on the standard is here:
We’ll use a graphical user interface to edit our data model called Protégé, which is an open-source application.
It’s available here: http://protege.stanford.edu/
For our data model, we need a single class (type of data object) to represent the articles in the 20 Newsgroup dataset.
Vital AI provides a core data model defining the most fundamental data types, and a base application data model which defines typical data objects such as “User” and “Document”.
For our Twenty Newsgroup dataset, we’ll extend the “Document” class and create the TwentyNewsArticle class.
The Vital AI “vitalsigns” application generates code from a data model, so our data object definitions can be used in our software.
From the command line we can enter:
vitalsigns generate -o twentynews-1.0.1.owl -p com.twentynews.model -j twentynews-groovy-1.0.2.jar
to create a JAR file which we can then include in our application.
In our IDE, we can use our new data model directly in our code.
In addition to our data objects, we need to define the categories we want to use in our predictions.