Introduction to Big Data Models with Vital AI

Part of a series to introduce the Vital AI software used to make predictions.

Go to beginning: Using the Vital AI software to make predictions

At the heart of any data-driven application is a data model – but often the data model is never is fully captured.  It is spread out over many schema files, databases, source code files, and the minds of the developers working on the application.

By capturing the data model in one place:

  • Developers can easily reference it across different software components
  • Developers have a single place to look for data definitions
  • Code can be generated directly from the data model
  • Errors can be detected much more easily by checking data against the model

Typical schema formats are very limited in what can be specified – to truly capture the full model for Big Data applications a much richer format must be used.

At Vital AI, we use the OWL standard to describe Big Data Models.  OWL is a standard used to create data models – also known as “ontologies.”

Some background on the standard is here:
http://en.wikipedia.org/wiki/Web_Ontology_Language

And documentation on the standard is here:
http://www.w3.org/TR/owl2-overview/

We’ll use a graphical user interface to edit our data model called Protégé, which is an open-source application.

It’s available here: http://protege.stanford.edu/

For our data model, we need a single class (type of data object) to represent the articles in the 20 Newsgroup dataset.

Vital AI provides a core data model defining the most fundamental data types, and a base application data model which defines typical data objects such as “User” and “Document”.

For our Twenty Newsgroup dataset, we’ll extend the “Document” class and create the TwentyNewsArticle class.

6a00e5510ddf1e883301a5118d5cfe970c-800wi

The Vital AI “vitalsigns” application generates code from a data model, so our data object definitions can be used in our software.

From the command line we can enter:

vitalsigns generate -o twentynews-1.0.1.owl -p com.twentynews.model -j twentynews-groovy-1.0.2.jar

to create a JAR file which we can then include in our application.

In our IDE, we can use our new data model directly in our code.

6a00e5510ddf1e883301a3fcddb903970b-800wi

In addition to our data objects, we need to define the categories we want to use in our predictions.

Next: Defining categories to use in predictions 

One thought on “Introduction to Big Data Models with Vital AI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s