Data Visualization with Vital AI, Wordnet, and Cytoscape

In this series of blog posts, I’ll provide an example of using the Vital AI Development Kit (VDK) for Data Visualization.

One of my favorite visualization applications is Cytoscape ( http://www.cytoscape.org/ ).  Cytoscape is often used in Life Science research applications, but can be used for any graph visualization need.  I highly recommend giving it a whirl.  In this example, we’ll create a plugin to Cytoscape to connect with the Vital AI software.

Wordnet is a wonderful dataset that captures many types of relationships among words and word categories, including relationships like “part-of” as in “hand is part-of an arm” and “member-of” as in “soldier is member-of an army”.  Wordnet was developed at Princeton University ( http://wordnet.princeton.edu/ ).

Because Wordnet contains relationships between words, it’s an ideal dataset to use for graph visualization.  The technique can be applied to many different types of data.

For this example, we will:

  • Generate a dataset using the source Wordnet Data ready to load into Vital AI
  • Create a plugin to Cytoscape to connect to the Vital AI software via the VDK API
  • Visually explore the Wordnet data, perform some graph analysis, and use the analysis output as part of our visualization.

Once complete, the Cytoscape interface viewing the Wordnet data via the underlying Vital AI VDK API will look like:

6a00e5510ddf1e883301a511ab3b61970c-800wi

Next Post: https://vitalai.com/2014/04/29/generating-a-wordnet-dataset-using-vital-ai-development-kit/

Generating a Wordnet Dataset using Vital AI Development Kit

Part of a series beginning with:

https://vitalai.com/2014/04/29/data-visualization-with-vital-ai-wordnet-and-cytoscape/

To import a new dataset into Vital AI with the VDK, the first thing we need to do is add any needed classes and properties into our data model to help model the dataset.

In the  case of Wordnet, we like to use it as an example, and so have added classes and properties for it into the main Vital data model (vital.owl).

The main Node we’ve defined is the SynsetNode, as Wordnet uses “synset” objects for synonym-sets.  This node has sub-classes for Verbs, Adjectives, Adverbs, and Nouns for those different types of words.

6a00e5510ddf1e883301a3fcfbc975970b-800wi

To connect the Wordnet SynSetNodes together, we represent the various Wordnet relationship types as Edges (there are a bunch).  Two such relationships are HyperNym and HypoNym which are sometimes called the type-of or is-a relationship, such as the relationship between Tiger/Animal or Red/Color.

More information about HyperNyms and HypoNyms is available via Wikipedia here:  http://en.wikipedia.org/wiki/Hyponymy_and_hypernymy.

6a00e5510ddf1e883301a3fcfbcb05970b-800wi

The current version of the Vital AI ontologies are available on github here:https://github.com/vital-ai/vital-ontology/tree/rel-0.1.0

Now that we have our data model ready, we can generate a dataset.

There is an open-source API to access the Wordnet dictionary files via Java available from:  http://projects.csail.mit.edu/jwi/

We can use this API to help generate our dataset with code like this to create all our nodes:

for(POS p : POS.values()) {
 
     for( Iterator
          synsetIterator = _dict.getSynsetIterator(p);
          synsetIterator.hasNext(); ) {
 
          ISynset next = synsetIterator.next()
 
          String gloss = next.getGloss();
 
          List words = next.getWords();
 
          String word_string = words.toString()
 
          String idPart = "${next.getPOS().getTag()}_${((ISynsetID)next.getID()).getOffset()}"
 
          SynsetNode sn = cls.newInstance();
 
          sn.URI = URIGenerator.generateURI("wordnet", cls)
          sn.name = word_string
          sn.gloss = gloss
          sn.wordnetID = idPart
 
          writer.startBlock()
          writer.writeGraphObject(sn)
          writer.endBlock()
     }
 
}

This mainly iterates over the parts-of-speech, iterates over the synonym-sets (“concepts”) in each part-of-speech, collects the words associated with each synonym-net, and adds a new SynsetNode for each synonym-set setting a URI (unique identifier), the set of words, the gloss (short definition), and Wordnet identifier.

and code like this to create all our edges:

for(POS p : POS.values()) {
 
for( Iterator synsetIterator = _dict.getSynsetIterator(p); synsetIterator.hasNext(); ) {
 
ISynset key = synsetIterator.next();
 
String uri = synsetWords.get(key.getID())
 
for( Iterator<Entry<IPointer, List>> iterator2 = key.getRelatedMap().entrySet().iterator(); iterator2.hasNext(); ) {
 
Entry<IPointer, List> next2 = iterator2.next();
 
IPointer type = next2.getKey();
List l = next2.getValue();
 
for(ISynsetID id : l) {
 
String destURI = synsetWords.get(id);
  
Edge_hasWordnetPointer newEdge = cls.newInstance();
 
newEdge.URI = URIGenerator.generateURI("wordnet", cls)
 
newEdge.sourceURI = uri
 
newEdge.destinationURI = destURI 
 
writer.startBlock()
writer.writeGraphObject(newEdge)
writer.endBlock()
 
 
}
 
}
 
}
 
}

This iterates over the parts-of-speech, iterates over all the synsets, gets the set of relationships for each, and adds an Edge for each such relationship using Edges of specific type, like HyperNym and HypoNym.

With this we have all our Nodes and Edges written to a dataset file (see previous blog entries for our file “block” format).

We can then import the dataset file into local or remote Vital Service endpoint instance.

Next Post: https://vitalai.com/2014/04/29/building-a-data-visualization-plugin-with-the-vital-ai-development-kit/

Building a Data Visualization Plugin with the Vital AI Development Kit

Part of a series beginning with:

https://vitalai.com/2014/04/29/data-visualization-with-vital-ai-wordnet-and-cytoscape/

With the last post, we have a dataset generated from Wordnet according to our data model, imported into a Vital Service Endpoint, which could be on our local machine or available remotely. For simplicity, we’ll assume it’s a local endpoint.

Cytoscape 3.X (the current version) allows creating “apps” (formerly known as “plugins”) that allow adding additional functionality to the Cytoscape desktop application.

The documentation for creating such apps can be found here: http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper

Cytoscape supports the OSGI standard ( http://en.wikipedia.org/wiki/OSGi ), which can be a little tricky. However, it provides a general way to include dependencies.

The source code for the Vital Service Cytoscape App is found on github here: https://github.com/vital-ai/vital-cytoscape

The two main implemented functions are: (1) searching the Wordnet data for a matching SynsetNode, and (2) given a particular Wordnet SynsetNode, performing a graph query to add all connected Edges and Nodes into the current graph.

For the first case, the “search” function is run in the SearchTab to produce a set of matching SynsetNodes. The snippet of code in SearchTab is:

ResultList rs = Application.get().search(selectquery);

For the second case, the “Expand Node” function is run when it’s selected by using the contextual menu on a Node (or set of Nodes).  The snippet of code in ExpandNodeTask is:

ResultList rs_connections = Application.get().getConnections(uri_string);

Within the Application class, we connect to VitalService and perform the select query, returning the found objects:

ResultList rlist = Factory.getVitalService().selectQuery(sq);

Within the Application class, we connect to VitalService and “expand” a graph object, triggering getting all the connected Nodes and Edges into the local cache, which we can then display.

GraphObject graphObjectExpanded = Factory.getVitalService().getExpanded(VitalURI.withString(uri_str), getWordnetSegment())

The Vital AI Cytoscape App adds some tabs and contexual menus to the Cytoscape User Interface.

Here’s the Search Tab displaying search results:

6a00e5510ddf1e883301a73db698d4970d-800wi

and here is the contextual menu associated with a SynsetNode, used to trigger “expanding” the Node:

6a00e5510ddf1e883301a511ab7bdf970c-800wi

Put it all together, and we can explore graph data stored in a Vital AI Endpoint using our new Cytoscape App!

Next we’ll use some Graph Analytics to help visualize our data.

Next Post: https://vitalai.com/2014/04/29/visualizing-data-with-graph-analytics-with-the-vital-ai-development-kit/

Visualizing Data with Graph Analytics with the Vital AI Development Kit

Part of a series beginning with:

https://vitalai.com/2014/04/29/data-visualization-with-vital-ai-wordnet-and-cytoscape/

In the previous post, we created a Cytoscape App connected to a Vital Service Endpoint containing the Wordnet dataset. The App can search the Wordnet data and “expand” it by adding connected Nodes and Edges to the visualized network.

Now let’s use some graph analytics to help visualize a network. We’ll be performing the analysis locally within Cytoscape. For a very large graph we would be using a server cluster to perform the analysis. The Vital AI VDK and Platform enable this by running the analysis within a Hadoop Cluster. However, for this example, we’ll be using a relatively small subset of the Wordnet data.

First let’s search for “car” (in the sense of “automobile”), add it to our network, and expand to get all nodes and edges up to 3 hops away. This gives us about 1,100 Nodes and around 1,500 Edges. Initially they are in a jumble, sometimes called a “hair-ball”.

6a00e5510ddf1e883301a511ab8119970c-800wi

Now, let’s run our network analysis, available from the “Tools” menu.

6a00e5510ddf1e883301a3fcfbd9d3970b-800wi

By doing the network analysis, we calculate various metrics about the network, such as how many edges are associated with each node — this is it’s “degree”.  Another such metric is called “centrality”.  This is a calculation of how “central” a node is to the network.  Central Nodes can be more “important” such as influencers in a social network.

Next, we associate some of these metrics with the network visualization.  We can adjust node size to the degree and color to centrality.  The more red a node is, the more “important” it is.

6a00e5510ddf1e883301a511ab8513970c-800wi

We use the centrality associated with the edges to help visually lay out the network, showing some underlying structure, using options in the “Layout” menu.

6a00e5510ddf1e883301a73db69e6e970d-800wi

Next we can zoom in on the middle of the network.

6a00e5510ddf1e883301a73db69e85970d-800wi

The node representing “car, automobile” is a deep red as it is the most central and important part of the graph.

Panning around, we can find “motor vehicle” —

6a00e5510ddf1e883301a511ab827d970c-800wi

“Motor Vehicle” is a reddish-yellow reflecting it’s importance, but not as important as “car, automobile”.

Panning over to “airplane” we see that it’s bright yellow, with it’s sub-types like “biplane” a bright green, reflecting that they are not central and not “important” by our metric.  This is not a surprise, as “airplane” is a bit removed from the rest of the “car, automobile” network — they do have “motor vehicle” in common, and “biplane” is even further removed.

6a00e5510ddf1e883301a73db69f0a970d-800wi

Cytoscape has many layout and visualization features, and paired with a Big Data repository via the Vital AI VDK, makes a compelling data analysis system.

Using the contextual menu also allows the App to be a great data exploration application to discover new ways the data is connected.

Hope you have enjoyed this series on integrating a Data Visualization application with Vital AI using the Vital AI Development Kit!