SmartData, NoSQL Now! conference talk: MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

I’m excited for my talk tomorrow at the NoSQLNow! conference!  I hope those in the San Francisco/San Jose area can make it, and for those that do, please join me on Tuesday at 2pm!

— Marc Hadfield

Further details:

Tuesday, August 18, 2015

02:00 PM – 02:45 PM

SmartData, NoSQL Now! conference in San Jose, California

SmartData conference:

NoSQL Now conference:

MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Each database has its strengths and weaknesses for different data access profiles, and we should endeavor to use the right tool for the right job.

However, adding another infrastructure component greatly increases not only the management effort, but also the development effort to integrate and maintain connections across multiple data repositories, let alone keeping the data synchronized.

In this talk, we’ll discuss MetaQL, a common query layer across database technologies including NoSQL, SQL, Sparql, and Spark.

Using a common query layer lessens the burden on developers, allows using the right database for the right job, and opens up data to additional analysis that would be unavailable previously – providing new and unexpected value.

In this talk, we will discuss:

  • Database Data Model and Schema
  • Java/JVM Query Builder, driven by Schema
  • Query constructs for “Select”, “Graph”, “Path”, and “Aggregation” Queries
  • NoSQL & SQL Databases
  • Sparql RDF Databases, including Allegrograph
  • Apache Spark, including SparkSQL and the new DataFrame API
  • ETL, Transactions, and Data Synchronization
  • Seamless queries across databases

More details:

Vital AI example apps for prediction using AlchemyAPI (IBM Bluemix),, and Apache Spark

Along with our recent release of VDK 0.2.254, we’ve added a few new example apps to help developers get started with the VDK.

By starting with one of these examples, you can quickly build applications for prediction, classification, and recommendation with a JavaScript web application front end, and prediction models on the server.  The examples use prediction models trained using Apache Spark or an external service such as AlchemyAPI (IBM Bluemix), or

There is also an example app for various queries of a document database containing the Enron Email dataset.  Some details on this dataset are here:

The example applications have the same architecture.


The components are:

  • JavaScript front end, using asynchronous messages to communicate with the server.  Messaging and domain model management are provided by the VitalService-JS library.
  • VertX application server, making use of the Vital-Vertx module.
  • VitalPrime server using DataScripts to implement server-side functionality, such as generating predictions using a Prediction Model.
  • Prediction Models to make predictions or recommendations.  A Prediction Model can be trained based on a training set, or it could interface to an external prediction service.  If trained, we often use Apache Spark with the Aspen library to create the trained prediction model.
  • A Database such as DynamoDB, Allegrograph, MongoDB, or other to store application data.

Here is a quick overview of some of the examples.

We’ll post detailed instructions on each app in followup blog entries.

MetaMind Image Classification App:

Source Code:

Demo Link:



This example uses a MetaMind ( ) prediction model to classify an image.

AlchemyAPI/IBM Bluemix Document Classification App

Source Code:

Demo Link:



This example app uses an AlchemyAPI (IBM Bluemix) prediction model to classify a document.

Movie Recommendation App

Source Code (Web Application):

Source Code (Training Prediction Model):

Demo Link:



This example uses a prediction model trained on the MovieLens data to recommend movies based on a user’s current movie ratings.  The prediction model uses the Collaborative Filtering algorithm trained using an Apache Spark job.  Each user has a user-id such as “1010” in the screenshot above.

Spark’s collaborative filtering implementation is described here:

The MovieLens data can be found here:

Enron Document Search App

Source Code:

Demo Link:



This example demonstrates how to implement different queries against a database, such as a “select” query — find all documents with certain keywords, and a “graph” query — find documents that are linked to users.

Example Data Visualizations:

The Cytoscape graph visualization tool can be used to visualize the above sample data using the Vital AI Cytoscape plugin.

The Cytoscape plugin is available from:

An example of visualizing the MovieLens data:


An example of visualizing the Wordnet Dataset, viewing the graph centered on “Red Wine”:


For generating and importing the Wordnet data, see sample code here:

Information about Wordnet is available here:

Another example of the Wordnet data, with some additional visual styles added:


Vital AI Dev Kit and Products Release 254

VDK 0.2.254 was recently released, as well as corresponding releases for each product.

The new release is available via the Dashboard:

Artifacts are in the maven repository:

Code is in the public github repos for public projects:

Highlights of the release include:

Vital AI Development Kit:

  • Support for deep domain model dependencies.
  • Full support of dynamic domains models (OWL to JVM and JSON-Schema)
  • Synchronization of domain models between local and remote vitalservice instances.
  • Service Operations DSL for version upgrade and downgrade to facilitate updating datasets during a domain model change.
  • Support for loading older/newer version of domain model to facility upgrading/downgrading datasets.
  • Configuration option to specify enforcement of version differences (strict, tolerant, lenient).
  • Able to specify preferred version of imported domain models.
  • Able to specify backward compatibility with prior domain model versions.
  • Support for deploy directories to cleanly separate domain models under development from those deployed in applications.


  • Full dynamic domain support
  • Synchronization of domain models between client and server
  • Datascripts to support domain model operations
  • Support for segment to segment data upgrade/downgrade for domain model version changes.


  • Prediction models to support externally defined taxonomies.
  • Support of AlchemyAPI prediction model
  • Support of MetaMind prediction model
  • Support for dynamic domain loading in Spark
  • Added jobs for upgrading/downgrading datasets for version change.

Vital Utilities:

  • Import and Export scripts using bulk operations of VitalService
  • Data migration script for updating dataset upon version change

Vital Vertx and VitalService-JS:

  • Support for dynamic domain models in JSON-Schema.
  • Asynchronous stream support, including multi-part data transfers (file upload/download in parts).

Vital Triplestore:

  • Support for EQ_CASE_INSENSITIVE comparator
  • Support for Allegrograph 4.14.1