Vital AI Cytoscape App in App Store

We recently published our Cytoscape App to the Cytoscape App Store.

Cytoscape is a wonderful graph visualization tool that is open-source, available on Desktops, and quite handy for graph analysis and visualization.

Our plugin allows Cytoscape to connect to databases, servers, and Apache Spark/Hadoop using the VitalService API.

The plugin is available directly in Cytoscape, or here:

Cytoscape is available here:

Prior to using the plugin, the Vital AI software must be installed and configured.  The Vital AI software can be downloaded from here:

The Cytoscape plugin uses the VITAL_HOME environmental variable to find the Vital AI software and configuration files.

For those using Mac OSX, OSX needs some extra help for desktop applications like Cytoscape to use environmental variables.

Here is a good StackOverflow answer which helps Mac OSX use environmental variables:

The first tab in the Vital AI plugin enables selecting which VitalService endpoint to connect to.  These come from the VitalService configuration file found at:


For Prime endpoints, an authorization key is used to connect.  For convenience, you can put such keys into your vitalsigns configuration file, like so:

config: {

local-key: key1-key1-key1  


The naming convention is: “vitalservicename”-key

So, the corresponding VitalService configuration entry would be:

profile.local {

type = VitalPrime

appID = analytics

VitalPrime {

endpointURL = “”




Now back to the plugin…


Here is the Connection tab.  Select the desired VitalService endpoint from the drop-down and hit “Connect”.

Now that we have connected, we can use the “Search” tab to search.


We can select which databases (“Segments”) to include, as well as what property to search.  In this case the “Wordnet” database is selected and the “name” property.

Let’s put all the results into a network.



Now let’s select them all and find what is connected to them.  This is called an “expansion” query and looks for everything connected to the starting node up to two hops (edges) away.


Starting the expansion…


Expanding all the selected nodes…  The “Paths” tab is used to select whether the expansion will be for one hop (one edge) or two hops (two edges), the direction of the desired edges (forward, backward, or both), and which “Segments” to include in the expansion.


And now we have some results!


Let’s zoom in on part of the network.  We can then further analyze the results, continue to explore and expand the network, or tune the visualization.




If we are connected to a Prime endpoint, we can use the Paths tab to select the node and edge types to filter with during an expansion query.



Also with a Prime endpoint, we can use the “DataScripts” tab to run datascripts on the server.  Datascripts can be used to analyze data, trigger Spark or Hadoop jobs, use a prediction model, or anything you like.

Please send along any comments or questions, and hope you enjoy using Cytoscape and our plugin to visualize your data.


Using the Beaker Notebook with Vital Service

In this post I’ll describe using the Beaker data science notebook with Vital back-end components for data exploration and analysis, using the Wordnet dataset as an example.

At Vital AI we use many tools to explore and analyze data, and chief among them are data science notebooks.  Examples include IPython/Jupyter and Zepplin, plus similar products/services such as Databricks and RStudio.

One that has become a recent favorite is the Beaker Notebook ( ).  Beaker is open-source with a git repo on github ( ) under very active development.

Beaker fully embraces polyglot programming and supports many programming languages including Javascript, Python, R, and JVM languages including Groovy, Java, and Scala.

Scala is especially nice for integration with Apache Spark.  R of course is great for stats and visualization, and JavaScript is convenient for visualization and web dashboards, especially when using visualization libraries like D3.

At Vital AI we typically use JVM for production server applications in combination with Apache Spark, so having all options available in a single notebook makes data analysis a very agile process.

About Data Models

At Vital AI we model data by creating data models (aka ontologies) to capture the meaning of the data, and use these data models within our code.  This allows the meaning of the data to guide our analysis, as well as enable strong data standards – saving a huge amount of manual effort.

We create data models using the open standard OWL, and then generate code using the VitalSigns tool.  This data model code is then utilized within all data analysis and workflows.

At runtime, VitalSigns loads data models into the JVM in one of two ways: from the classpath that was specified when the JVM started via the ServiceLoader API or dynamically via a dynamic classloader.

By using the dynamic method, we can use the Vital Prime server as a “data model server” so that data models are discovered and loaded at run-time from the Prime server.  Thus the data models are kept in sync with data managed by the Prime server, so data analysis is always working with the latest data definitions.

As Groovy is a dynamic language on the JVM, we use Groovy for many data analysis scripts that use data models.

About Wordnet

One of our favorite datasets to use is Wordnet.  From the Wordnet website ( ):

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.

As Wordnet has the form of a graph – words linked to words linked to other words – it is very convenient for visualization.

VitalService API

The VitalService API is a standard API that includes methods for data queries, running analysis scripts (aka datascripts), and reading, saving, updating, and deleting data (so-called “CRUD” operations).  We use the VitalService API for working with data locally, accessing a database, or using a remote service.  This means we use the same API calls when we switch from working with data locally to working with a production service, so we can use the same code library throughout.

Application Architecture


A full-stack of an application may include a web application layer, a VitalService implementation such as Prime, a Database such as DynamoDB, and an analysis environment based on Apache Spark and Hadoop.  Above, Prime is managing the data models (the “gear” icons) and provides “datascripts” to process data via a scripting interface.



The above diagram focuses on the current case of Beaker Notebook where we are connecting to VitalService Prime as a client, synchronizing the data models, and sending queries to an underlying database.  In our example, the database contains the Wordnet data.

Some sample code to generate the Wordnet dataset is here:

And some sample code to load the Wordnet data into VitalService is here:

Or more generally the vitalimport utility could be used, found in here:

Back to the Beaker Notebook

Now that we’ve a few definitions out of the way, we can get back to using Beaker.

The previous version of Beaker had some limitations to loading JVM classes (see: ) which are now fixed in Beaker’s github but not yet included in a released version.  We’re currently using a patched version here: until the next release.

For this example, let’s query some data using VitalService and then visualize the resulting data using D3 with JavaScript.

Our example is based on the one found here:

Our example will include 3 cells: one Groovy one to do a query, one JavaScript one to run D3 over the data, and one HTML one to display the resulting graph.

The Groovy cell first connects to VitalService, like so:

VitalSigns vs = VitalSigns.get()

VitalServiceKey key = new VitalServiceKey().generateURI()
key.key = vs.getConfig("analyticsKey")

def service = VitalServiceFactory.openService(key, "prime", "AnalyticsService")

The code above initializes VitalSigns, sets an authentication key based upon a value in a configuration file, and connects to the VitalService endpoint.  Prime requires an authentication key for security.

vs.pipeline { ->

def builder = new VitalBuilder()

VitalGraphQuery q = builder.query {

// query for graphs like:
// node1(name:happy) ---edge--->node2


value segments: ["wordnet"]
value inlineObjects: true

// bind this node to name "node1"
node_bind { "node1" }

// include subclasses of SynsetNode: Noun, Verb, Adjective, Adverb
node_constraint { SynsetNode.expandSubclasses(true) }
node_constraint { SynsetNode.props().name.equalTo("happy") }

// bind the node and edge to names "node2" and "edge"
edge_bind { "edge" }
node_bind { "node2" }

ResultList list = service.query( q )

// count the results
def j = 1

list.each {

// Use the binding names to get the URI values out of GraphMatch

def node1_uri = it."node1".toString()
def edge_uri = it."edge".toString()
def node2_uri = it."node2".toString()

// inlineObjects is true, which embeds unseen objects into the results
// if cache is null, get graph object out of GraphMatch results
// graph objects referenced via the URI

def node1 = vs.getFromCache(node1_uri) ?: it."$node1_uri"
def edge = vs.getFromCache(edge_uri) ?: it."$edge_uri"
def node2 = vs.getFromCache(node2_uri) ?: it."$node2_uri"

// add new ones into cache, doesn't hurt to refresh existing ones
vs.addToCache([node1, edge, node2])

// print out node1 --edge--> node2, with edge type (minus the namespace)
println j++ + ": " + + "---" + (edge.vitaltype.toString() - "") + "-->" +



The above code performs a query for all Wordnet entries with the name “happy”, and then follows all links from those to other words, putting the results into a cache, as well as printing them out.

Note the use of data model objects in the code above like: “SynsetNode”, “VITAL_Node”, and “VITAL_Edge”.  These are used which avoids having any code which directly parses data – the analysis code receives data objects which are “typed” according to the data model.

A screenshot:


The result of the “println” statements is:

1: happy—Edge_WordnetSimilarTo–>laughing, riant
2: happy—Edge_WordnetAlsoSee–>joyful
3: happy—Edge_WordnetAlsoSee–>joyous
4: happy—Edge_WordnetSimilarTo–>golden, halcyon, prosperous
5: happy—Edge_WordnetAttribute–>happiness, felicity
6: happy—Edge_WordnetAlsoSee–>euphoric
7: happy—Edge_WordnetAlsoSee–>elated
8: happy—Edge_WordnetAlsoSee–>cheerful
9: happy—Edge_WordnetAlsoSee–>felicitous
10: happy—Edge_WordnetAlsoSee–>glad
11: happy—Edge_WordnetAlsoSee–>contented, content
12: happy—Edge_WordnetSimilarTo–>blissful
13: happy—Edge_WordnetSimilarTo–>blessed
14: happy—Edge_WordnetSimilarTo–>bright
15: happy—Edge_WordnetAttribute–>happiness

We then take all the nodes and edges in the cache and turn them into JSON data as D3 expects.

def nodes = []
  def links = []
  Iterator i = vs.getCacheIterator()
  def c = 0
  while(i.hasNext() ) {
    GraphObject g =
    if(g.isSubTypeOf(VITAL_Node)) {
      g."local:index" = c
      nodes.add ( "{\"name\": \"$\", \"group\": $c}" )
  def max = c
  i = vs.getCacheIterator()
  while(i.hasNext() ) {
  GraphObject g =
    if(g.isSubTypeOf(VITAL_Edge)) {
        def srcURI = g.sourceURI
        def destURI = g.destinationURI
        def source = vs.getFromCache(srcURI)
        def destination = vs.getFromCache(destURI)
        def sourceIndex = source."local:index"
        def destinationIndex = destination."local:index"
      links.add (  "{\"source\": $sourceIndex, \"target\": $destinationIndex, \"value\": 10}"   ) 
  println "Graph:" + "{\"nodes\": $nodes, \"links\": $links}"
  beaker.graph = "{\"nodes\": $nodes, \"links\": $links}"


The last line above puts the data into a “beaker” object, which is the handoff point to other languages.

A screenshot of the results and the JSON:


Then in a Javascript cell:

var graphstr = JSON.stringify(beaker.graph);

var graph = JSON.parse(graphstr)

var width = 800,
    height = 300;

var color = d3.scale.category20();

var force = d3.layout.force()
    .size([width, height]);

var svg ="#fdg").append("svg")
    .attr("width", width)
    .attr("height", height);

var drawGraph = function(graph) {

  var link = svg.selectAll(".link")
      .attr("class", "link")
      .style("stroke-width", function(d) { return Math.sqrt(d.value); });

  var gnodes = svg.selectAll('g.gnode')
     .classed('gnode', true);
  var node = gnodes.append("circle")
      .attr("class", "node")
      .attr("r", 10)
      .style("fill", function(d) { return color(; })

  var labels = gnodes.append("text")
      .text(function(d) { return; });

  force.on("tick", function() {
    link.attr("x1", function(d) { return d.source.x; })
        .attr("y1", function(d) { return d.source.y; })
        .attr("x2", function(d) { return; })
        .attr("y2", function(d) { return; });

    gnodes.attr("transform", function(d) { 
        return 'translate(' + [d.x, d.y] + ')'; 




Note the handoff of the “beaker.graph” object in the beginning of the JavaScript code.  It may be a bit tricky to get the data exchanges right so that JSON produced on the groovy side is interpreted as JSON on the JavaScript side, or vice-versa.  Beaker provides auto-translation for various data structures including DataFrames, but it still takes some trial and error to get it right.

The above JavaScript code comes from the Beaker example project, plus this StackOverFlow article which discusses adding labels to graphs:

In the last Beaker cell, we include the HTML to be the “target” of the JavaScript code:

.node {
  stroke: #fff;
  stroke-width: 1.5px;

.link {
  stroke: #999;
  stroke-opacity: .6;
<div id="fdg"></div>

And a screenshot of the HTML cell with the resulting D3 graph.


Hope you have enjoyed this walkthrough of using Beaker with the VitalService interface, and visualizing query results in a graph with D3.

Please ask any questions in the comments section, or send them to us at

Happy New Year!

Vital AI

Vital AI example apps for prediction using AlchemyAPI (IBM Bluemix),, and Apache Spark

Along with our recent release of VDK 0.2.254, we’ve added a few new example apps to help developers get started with the VDK.

By starting with one of these examples, you can quickly build applications for prediction, classification, and recommendation with a JavaScript web application front end, and prediction models on the server.  The examples use prediction models trained using Apache Spark or an external service such as AlchemyAPI (IBM Bluemix), or

There is also an example app for various queries of a document database containing the Enron Email dataset.  Some details on this dataset are here:

The example applications have the same architecture.


The components are:

  • JavaScript front end, using asynchronous messages to communicate with the server.  Messaging and domain model management are provided by the VitalService-JS library.
  • VertX application server, making use of the Vital-Vertx module.
  • VitalPrime server using DataScripts to implement server-side functionality, such as generating predictions using a Prediction Model.
  • Prediction Models to make predictions or recommendations.  A Prediction Model can be trained based on a training set, or it could interface to an external prediction service.  If trained, we often use Apache Spark with the Aspen library to create the trained prediction model.
  • A Database such as DynamoDB, Allegrograph, MongoDB, or other to store application data.

Here is a quick overview of some of the examples.

We’ll post detailed instructions on each app in followup blog entries.

MetaMind Image Classification App:

Source Code:

Demo Link:



This example uses a MetaMind ( ) prediction model to classify an image.

AlchemyAPI/IBM Bluemix Document Classification App

Source Code:

Demo Link:



This example app uses an AlchemyAPI (IBM Bluemix) prediction model to classify a document.

Movie Recommendation App

Source Code (Web Application):

Source Code (Training Prediction Model):

Demo Link:



This example uses a prediction model trained on the MovieLens data to recommend movies based on a user’s current movie ratings.  The prediction model uses the Collaborative Filtering algorithm trained using an Apache Spark job.  Each user has a user-id such as “1010” in the screenshot above.

Spark’s collaborative filtering implementation is described here:

The MovieLens data can be found here:

Enron Document Search App

Source Code:

Demo Link:



This example demonstrates how to implement different queries against a database, such as a “select” query — find all documents with certain keywords, and a “graph” query — find documents that are linked to users.

Example Data Visualizations:

The Cytoscape graph visualization tool can be used to visualize the above sample data using the Vital AI Cytoscape plugin.

The Cytoscape plugin is available from:

An example of visualizing the MovieLens data:


An example of visualizing the Wordnet Dataset, viewing the graph centered on “Red Wine”:


For generating and importing the Wordnet data, see sample code here:

Information about Wordnet is available here:

Another example of the Wordnet data, with some additional visual styles added:


Data Visualization with Vital AI, Wordnet, and Cytoscape

In this series of blog posts, I’ll provide an example of using the Vital AI Development Kit (VDK) for Data Visualization.

One of my favorite visualization applications is Cytoscape ( ).  Cytoscape is often used in Life Science research applications, but can be used for any graph visualization need.  I highly recommend giving it a whirl.  In this example, we’ll create a plugin to Cytoscape to connect with the Vital AI software.

Wordnet is a wonderful dataset that captures many types of relationships among words and word categories, including relationships like “part-of” as in “hand is part-of an arm” and “member-of” as in “soldier is member-of an army”.  Wordnet was developed at Princeton University ( ).

Because Wordnet contains relationships between words, it’s an ideal dataset to use for graph visualization.  The technique can be applied to many different types of data.

For this example, we will:

  • Generate a dataset using the source Wordnet Data ready to load into Vital AI
  • Create a plugin to Cytoscape to connect to the Vital AI software via the VDK API
  • Visually explore the Wordnet data, perform some graph analysis, and use the analysis output as part of our visualization.

Once complete, the Cytoscape interface viewing the Wordnet data via the underlying Vital AI VDK API will look like:


Next Post:

Building a Data Visualization Plugin with the Vital AI Development Kit

Part of a series beginning with:

With the last post, we have a dataset generated from Wordnet according to our data model, imported into a Vital Service Endpoint, which could be on our local machine or available remotely. For simplicity, we’ll assume it’s a local endpoint.

Cytoscape 3.X (the current version) allows creating “apps” (formerly known as “plugins”) that allow adding additional functionality to the Cytoscape desktop application.

The documentation for creating such apps can be found here:

Cytoscape supports the OSGI standard ( ), which can be a little tricky. However, it provides a general way to include dependencies.

The source code for the Vital Service Cytoscape App is found on github here:

The two main implemented functions are: (1) searching the Wordnet data for a matching SynsetNode, and (2) given a particular Wordnet SynsetNode, performing a graph query to add all connected Edges and Nodes into the current graph.

For the first case, the “search” function is run in the SearchTab to produce a set of matching SynsetNodes. The snippet of code in SearchTab is:

ResultList rs = Application.get().search(selectquery);

For the second case, the “Expand Node” function is run when it’s selected by using the contextual menu on a Node (or set of Nodes).  The snippet of code in ExpandNodeTask is:

ResultList rs_connections = Application.get().getConnections(uri_string);

Within the Application class, we connect to VitalService and perform the select query, returning the found objects:

ResultList rlist = Factory.getVitalService().selectQuery(sq);

Within the Application class, we connect to VitalService and “expand” a graph object, triggering getting all the connected Nodes and Edges into the local cache, which we can then display.

GraphObject graphObjectExpanded = Factory.getVitalService().getExpanded(VitalURI.withString(uri_str), getWordnetSegment())

The Vital AI Cytoscape App adds some tabs and contexual menus to the Cytoscape User Interface.

Here’s the Search Tab displaying search results:


and here is the contextual menu associated with a SynsetNode, used to trigger “expanding” the Node:


Put it all together, and we can explore graph data stored in a Vital AI Endpoint using our new Cytoscape App!

Next we’ll use some Graph Analytics to help visualize our data.

Next Post:

Visualizing Data with Graph Analytics with the Vital AI Development Kit

Part of a series beginning with:

In the previous post, we created a Cytoscape App connected to a Vital Service Endpoint containing the Wordnet dataset. The App can search the Wordnet data and “expand” it by adding connected Nodes and Edges to the visualized network.

Now let’s use some graph analytics to help visualize a network. We’ll be performing the analysis locally within Cytoscape. For a very large graph we would be using a server cluster to perform the analysis. The Vital AI VDK and Platform enable this by running the analysis within a Hadoop Cluster. However, for this example, we’ll be using a relatively small subset of the Wordnet data.

First let’s search for “car” (in the sense of “automobile”), add it to our network, and expand to get all nodes and edges up to 3 hops away. This gives us about 1,100 Nodes and around 1,500 Edges. Initially they are in a jumble, sometimes called a “hair-ball”.


Now, let’s run our network analysis, available from the “Tools” menu.


By doing the network analysis, we calculate various metrics about the network, such as how many edges are associated with each node — this is it’s “degree”.  Another such metric is called “centrality”.  This is a calculation of how “central” a node is to the network.  Central Nodes can be more “important” such as influencers in a social network.

Next, we associate some of these metrics with the network visualization.  We can adjust node size to the degree and color to centrality.  The more red a node is, the more “important” it is.


We use the centrality associated with the edges to help visually lay out the network, showing some underlying structure, using options in the “Layout” menu.


Next we can zoom in on the middle of the network.


The node representing “car, automobile” is a deep red as it is the most central and important part of the graph.

Panning around, we can find “motor vehicle” —


“Motor Vehicle” is a reddish-yellow reflecting it’s importance, but not as important as “car, automobile”.

Panning over to “airplane” we see that it’s bright yellow, with it’s sub-types like “biplane” a bright green, reflecting that they are not central and not “important” by our metric.  This is not a surprise, as “airplane” is a bit removed from the rest of the “car, automobile” network — they do have “motor vehicle” in common, and “biplane” is even further removed.


Cytoscape has many layout and visualization features, and paired with a Big Data repository via the Vital AI VDK, makes a compelling data analysis system.

Using the contextual menu also allows the App to be a great data exploration application to discover new ways the data is connected.

Hope you have enjoyed this series on integrating a Data Visualization application with Vital AI using the Vital AI Development Kit!