Part of a series beginning with:
https://vitalai.com/2014/04/29/data-visualization-with-vital-ai-wordnet-and-cytoscape/
To import a new dataset into Vital AI with the VDK, the first thing we need to do is add any needed classes and properties into our data model to help model the dataset.
In the case of Wordnet, we like to use it as an example, and so have added classes and properties for it into the main Vital data model (vital.owl).
The main Node we’ve defined is the SynsetNode, as Wordnet uses “synset” objects for synonym-sets. This node has sub-classes for Verbs, Adjectives, Adverbs, and Nouns for those different types of words.

To connect the Wordnet SynSetNodes together, we represent the various Wordnet relationship types as Edges (there are a bunch). Two such relationships are HyperNym and HypoNym which are sometimes called the type-of or is-a relationship, such as the relationship between Tiger/Animal or Red/Color.
More information about HyperNyms and HypoNyms is available via Wikipedia here: http://en.wikipedia.org/wiki/Hyponymy_and_hypernymy.

The current version of the Vital AI ontologies are available on github here:https://github.com/vital-ai/vital-ontology/tree/rel-0.1.0
Now that we have our data model ready, we can generate a dataset.
There is an open-source API to access the Wordnet dictionary files via Java available from: http://projects.csail.mit.edu/jwi/
We can use this API to help generate our dataset with code like this to create all our nodes:
for(POS p : POS.values()) {
for( Iterator
synsetIterator = _dict.getSynsetIterator(p);
synsetIterator.hasNext(); ) {
ISynset next = synsetIterator.next()
String gloss = next.getGloss();
List words = next.getWords();
String word_string = words.toString()
String idPart = "${next.getPOS().getTag()}_${((ISynsetID)next.getID()).getOffset()}"
SynsetNode sn = cls.newInstance();
sn.URI = URIGenerator.generateURI("wordnet", cls)
sn.name = word_string
sn.gloss = gloss
sn.wordnetID = idPart
writer.startBlock()
writer.writeGraphObject(sn)
writer.endBlock()
}
}
This mainly iterates over the parts-of-speech, iterates over the synonym-sets (“concepts”) in each part-of-speech, collects the words associated with each synonym-net, and adds a new SynsetNode for each synonym-set setting a URI (unique identifier), the set of words, the gloss (short definition), and Wordnet identifier.
and code like this to create all our edges:
for(POS p : POS.values()) {
for( Iterator synsetIterator = _dict.getSynsetIterator(p); synsetIterator.hasNext(); ) {
ISynset key = synsetIterator.next();
String uri = synsetWords.get(key.getID())
for( Iterator<Entry<IPointer, List>> iterator2 = key.getRelatedMap().entrySet().iterator(); iterator2.hasNext(); ) {
Entry<IPointer, List> next2 = iterator2.next();
IPointer type = next2.getKey();
List l = next2.getValue();
for(ISynsetID id : l) {
String destURI = synsetWords.get(id);
Edge_hasWordnetPointer newEdge = cls.newInstance();
newEdge.URI = URIGenerator.generateURI("wordnet", cls)
newEdge.sourceURI = uri
newEdge.destinationURI = destURI
writer.startBlock()
writer.writeGraphObject(newEdge)
writer.endBlock()
}
}
}
}
This iterates over the parts-of-speech, iterates over all the synsets, gets the set of relationships for each, and adds an Edge for each such relationship using Edges of specific type, like HyperNym and HypoNym.
With this we have all our Nodes and Edges written to a dataset file (see previous blog entries for our file “block” format).
We can then import the dataset file into local or remote Vital Service endpoint instance.
Next Post: https://vitalai.com/2014/04/29/building-a-data-visualization-plugin-with-the-vital-ai-development-kit/