SmartData, NoSQL Now! conference talk: MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

I’m excited for my talk tomorrow at the NoSQLNow! conference!  I hope those in the San Francisco/San Jose area can make it, and for those that do, please join me on Tuesday at 2pm!

— Marc Hadfield

Further details:

Tuesday, August 18, 2015

02:00 PM – 02:45 PM

SmartData, NoSQL Now! conference in San Jose, California

SmartData conference: http://smartdata2015.dataversity.net/

NoSQL Now conference: http://nosql2015.dataversity.net/

MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Each database has its strengths and weaknesses for different data access profiles, and we should endeavor to use the right tool for the right job.

However, adding another infrastructure component greatly increases not only the management effort, but also the development effort to integrate and maintain connections across multiple data repositories, let alone keeping the data synchronized.

In this talk, we’ll discuss MetaQL, a common query layer across database technologies including NoSQL, SQL, Sparql, and Spark.

Using a common query layer lessens the burden on developers, allows using the right database for the right job, and opens up data to additional analysis that would be unavailable previously – providing new and unexpected value.

In this talk, we will discuss:

  • Database Data Model and Schema
  • Java/JVM Query Builder, driven by Schema
  • Query constructs for “Select”, “Graph”, “Path”, and “Aggregation” Queries
  • NoSQL & SQL Databases
  • Sparql RDF Databases, including Allegrograph
  • Apache Spark, including SparkSQL and the new DataFrame API
  • ETL, Transactions, and Data Synchronization
  • Seamless queries across databases

More details: http://nosql2015.dataversity.net/sessionPop.cfm?confid=90&proposalid=7823

Vital AI example apps for prediction using AlchemyAPI (IBM Bluemix), Metamind.io, and Apache Spark

Along with our recent release of VDK 0.2.254, we’ve added a few new example apps to help developers get started with the VDK.

By starting with one of these examples, you can quickly build applications for prediction, classification, and recommendation with a JavaScript web application front end, and prediction models on the server.  The examples use prediction models trained using Apache Spark or an external service such as AlchemyAPI (IBM Bluemix), or Metamind.io.

There is also an example app for various queries of a document database containing the Enron Email dataset.  Some details on this dataset are here: https://www.cs.cmu.edu/~./enron/

The example applications have the same architecture.

6a00e5510ddf1e883301bb086472c6970d-800wi

The components are:

  • JavaScript front end, using asynchronous messages to communicate with the server.  Messaging and domain model management are provided by the VitalService-JS library.
  • VertX application server, making use of the Vital-Vertx module.
  • VitalPrime server using DataScripts to implement server-side functionality, such as generating predictions using a Prediction Model.
  • Prediction Models to make predictions or recommendations.  A Prediction Model can be trained based on a training set, or it could interface to an external prediction service.  If trained, we often use Apache Spark with the Aspen library to create the trained prediction model.
  • A Database such as DynamoDB, Allegrograph, MongoDB, or other to store application data.

Here is a quick overview of some of the examples.

We’ll post detailed instructions on each app in followup blog entries.

MetaMind Image Classification App:

Source Code:

https://github.com/vital-ai/vital-examples/tree/master/metamind-app

Demo Link:

https://demos.vital.ai/metamind-app/index.html

Screenshot:

6a00e5510ddf1e883301b7c7c02e88970b-800wi

This example uses a MetaMind ( https://www.metamind.io/ ) prediction model to classify an image.

AlchemyAPI/IBM Bluemix Document Classification App

Source Code:

https://github.com/vital-ai/vital-examples/tree/master/alchemyapi-app

Demo Link:

https://demos.vital.ai/alchemyapi-app/index.html

Screenshot:

6a00e5510ddf1e883301b8d14a0496970c-800wi

This example app uses an AlchemyAPI (IBM Bluemix) prediction model to classify a document.

Movie Recommendation App

Source Code (Web Application):

https://github.com/vital-ai/vital-examples/tree/master/movie-recommendations-js-app

Source Code (Training Prediction Model):

https://github.com/vital-ai/vital-examples/tree/master/movie-recommendations

Demo Link:

https://demos.vital.ai/movie-recommendations-js-app/index.html

Screenshot:

6a00e5510ddf1e883301b7c7c03038970b-800wi

This example uses a prediction model trained on the MovieLens data to recommend movies based on a user’s current movie ratings.  The prediction model uses the Collaborative Filtering algorithm trained using an Apache Spark job.  Each user has a user-id such as “1010” in the screenshot above.

Spark’s collaborative filtering implementation is described here:

http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html

The MovieLens data can be found here:

http://grouplens.org/datasets/movielens/

Enron Document Search App

Source Code:

https://github.com/vital-ai/vital-examples/tree/master/enron-js-app

Demo Link:

https://demos.vital.ai/enron-js-app/index.html

Screenshot:

6a00e5510ddf1e883301b7c7c0314e970b-800wi

This example demonstrates how to implement different queries against a database, such as a “select” query — find all documents with certain keywords, and a “graph” query — find documents that are linked to users.

Example Data Visualizations:

The Cytoscape graph visualization tool can be used to visualize the above sample data using the Vital AI Cytoscape plugin.

The Cytoscape plugin is available from:

https://github.com/vital-ai/vital-cytoscape

An example of visualizing the MovieLens data:

6a00e5510ddf1e883301b8d14a0660970c-800wi

An example of visualizing the Wordnet Dataset, viewing the graph centered on “Red Wine”:

6a00e5510ddf1e883301b8d14a07de970c-800wi

For generating and importing the Wordnet data, see sample code here:

https://github.com/vital-ai/vital-examples/tree/master/vital-samples/src/main/groovy/ai/vital/samples

Information about Wordnet is available here:

https://wordnet.princeton.edu/

Another example of the Wordnet data, with some additional visual styles added:

6a00e5510ddf1e883301b7c7c03384970b-800wi

Vital AI Dev Kit and Products Release 254

VDK 0.2.254 was recently released, as well as corresponding releases for each product.

The new release is available via the Dashboard:

https://dashboard.vital.ai

Artifacts are in the maven repository:

https://github.com/vital-ai/vital-public-mvn-repo/tree/releases/vital-ai

Code is in the public github repos for public projects:

https://github.com/orgs/vital-ai

Highlights of the release include:

Vital AI Development Kit:

  • Support for deep domain model dependencies.
  • Full support of dynamic domains models (OWL to JVM and JSON-Schema)
  • Synchronization of domain models between local and remote vitalservice instances.
  • Service Operations DSL for version upgrade and downgrade to facilitate updating datasets during a domain model change.
  • Support for loading older/newer version of domain model to facility upgrading/downgrading datasets.
  • Configuration option to specify enforcement of version differences (strict, tolerant, lenient).
  • Able to specify preferred version of imported domain models.
  • Able to specify backward compatibility with prior domain model versions.
  • Support for deploy directories to cleanly separate domain models under development from those deployed in applications.

VitalPrime:

  • Full dynamic domain support
  • Synchronization of domain models between client and server
  • Datascripts to support domain model operations
  • Support for segment to segment data upgrade/downgrade for domain model version changes.

Aspen:

  • Prediction models to support externally defined taxonomies.
  • Support of AlchemyAPI prediction model
  • Support of MetaMind prediction model
  • Support for dynamic domain loading in Spark
  • Added jobs for upgrading/downgrading datasets for version change.

Vital Utilities:

  • Import and Export scripts using bulk operations of VitalService
  • Data migration script for updating dataset upon version change

Vital Vertx and VitalService-JS:

  • Support for dynamic domain models in JSON-Schema.
  • Asynchronous stream support, including multi-part data transfers (file upload/download in parts).

Vital Triplestore:

  • Support for EQ_CASE_INSENSITIVE comparator
  • Support for Allegrograph 4.14.1

Amazon Echo tells “Yo Mama…” jokes.

To experiment with the Amazon Echo API, we created a little app called “Funnybot”.

The details of the app can be found in the previous post here:

https://vitalai.com/2015/07/building-an-amazon-echo-app-with-vital-ai.html 

 

All the source code of the app can be found on github here:

https://github.com/vital-ai/vital-examples/tree/master/amazon-echo-humor-app

The Vital AI components are available here: https://dashboard.vital.ai/

You may notice a Raspberry Pi in the video also — we’re in the midst of integrating the Echo with the Raspberry Pi for a home automation application.

Building an Amazon Echo App with Vital AI

A recent delivery from Amazon brought us an Amazon Echo. After some initial fun experimentation, including hooking it up to a Belkin Wemo switch and my Pandora account, we dove into the developer API to hook it up to the Vital AI platform and see what we could do.

Note: All the code discussed below can be found on github: https://github.com/vital-ai/vital-examples/tree/master/amazon-echo-humor-app

The Voice Interface configuration is a bit similar to the Wit.AI API, and setting things up on the Amazon side was easy enough. Some Amazon provided Java code gave us a good start on the webservice backend.

We often use Vert.X ( http://vertx.io/ ) for web applications, and created the REST Webservice using that.

This we configured to communicate with our Vital Prime server, which itself is configured with a database for storage.

So, the final architecture is:

6a00e5510ddf1e883301b8d133caa3970c-800wi

When the Echo gets a request like “Alexa, launch <your-app-name>”, the Echo communicates with the Amazon Echo service, which in turn communicates with the app, which in our case is implemented with the Vert.X Webservice.

The Webservice makes an API call to the Prime server, which uses a datascript to fulfill the request.

The datascript picks a random joke from those in its database, and replies to Echo with the joke.

We loaded in a few hundred “Yo Mama” jokes as a starting point, and called the app “Funnybot” in an homage to a rather terrible episode of South Park.

In this case the backend was a pretty simple database lookup, but a more realistic example would include a “serving” layer as well as a scalable streaming and “analytics” layer.

In this case, the architecture would look like:

6a00e5510ddf1e883301b8d133cb54970c-800wi

Here we are using Apache Spark as the streaming (Spark Streaming) and the analytics layer (such as Spark GraphX or MLLib).  Other than Spark, one could also use Apache Storm for streaming with the same basic architecture.

Aspen Datawarehouse is an open-source collection of software on top of Apache Spark to help connect streaming and analysis on Spark to the front end of the architecture,  mainly by keeping the data model consistent and providing integration hooks — thus we get a nice handoff among Vert.X, Prime, and Spark.

Datascripts are scripts running within Prime, typically implementing logic for an application that is close to the data.

In this case, the datascript is doing a query for all the Jokes in the database, caching them, and randomly picking one from the cached list.

The query is:

VitalSelectQuery vsq = new VitalBuilder().query {
 
SELECT {
     value segments: ['humor-app']
     value offset: 0
     value limit: 10000
     node_constraint { Joke.class }
}}.toQuery()

We’ve created a simple domain model which includes a class for “Joke”, and by constraining the query to “Joke” objects, we get back all the jokes that are within the “humor-app” database.

All the implementation code can be found on github here: https://github.com/vital-ai/vital-examples/tree/master/amazon-echo-humor-app

The Vital AI components mentioned above are available from Vital AI here: https://dashboard.vital.ai/

Please contact us at info@vital.ai if you would like assistance creating an Amazon Echo application.

In my next post I’ll show a quick video of the result.

We’re currently in the process of hooking the Echo to a Raspberry Pi via the Vital AI platform for a home automation application — stay tuned!

Join Vital AI at NY Tech Day Tomorrow!

Come to NY Tech Day tomorrow (Thursday, April 23rd) and stop by our booth!

e70FYXSbyU5qqtkkvoZmCMmlPEeElI7iYk1P0pKC2jU

NY Tech Day is an annual start-up extravaganza with over 400 companies presenting and over 10,000 attendees.  It’s always a very exciting day.

It’s free to attend and is held at Pier 92 on the west side of Manhattan (around 54th Street, on the Hudson River).

More details are available at: https://techdayhq.com/new-york

Hope to see you there!