Generating a Wordnet Dataset using Vital AI Development Kit

Part of a series beginning with:

https://vitalai.com/2014/04/29/data-visualization-with-vital-ai-wordnet-and-cytoscape/

To import a new dataset into Vital AI with the VDK, the first thing we need to do is add any needed classes and properties into our data model to help model the dataset.

In the  case of Wordnet, we like to use it as an example, and so have added classes and properties for it into the main Vital data model (vital.owl).

The main Node we’ve defined is the SynsetNode, as Wordnet uses “synset” objects for synonym-sets.  This node has sub-classes for Verbs, Adjectives, Adverbs, and Nouns for those different types of words.

6a00e5510ddf1e883301a3fcfbc975970b-800wi

To connect the Wordnet SynSetNodes together, we represent the various Wordnet relationship types as Edges (there are a bunch).  Two such relationships are HyperNym and HypoNym which are sometimes called the type-of or is-a relationship, such as the relationship between Tiger/Animal or Red/Color.

More information about HyperNyms and HypoNyms is available via Wikipedia here:  http://en.wikipedia.org/wiki/Hyponymy_and_hypernymy.

6a00e5510ddf1e883301a3fcfbcb05970b-800wi

The current version of the Vital AI ontologies are available on github here:https://github.com/vital-ai/vital-ontology/tree/rel-0.1.0

Now that we have our data model ready, we can generate a dataset.

There is an open-source API to access the Wordnet dictionary files via Java available from:  http://projects.csail.mit.edu/jwi/

We can use this API to help generate our dataset with code like this to create all our nodes:

for(POS p : POS.values()) {
 
     for( Iterator
          synsetIterator = _dict.getSynsetIterator(p);
          synsetIterator.hasNext(); ) {
 
          ISynset next = synsetIterator.next()
 
          String gloss = next.getGloss();
 
          List words = next.getWords();
 
          String word_string = words.toString()
 
          String idPart = "${next.getPOS().getTag()}_${((ISynsetID)next.getID()).getOffset()}"
 
          SynsetNode sn = cls.newInstance();
 
          sn.URI = URIGenerator.generateURI("wordnet", cls)
          sn.name = word_string
          sn.gloss = gloss
          sn.wordnetID = idPart
 
          writer.startBlock()
          writer.writeGraphObject(sn)
          writer.endBlock()
     }
 
}

This mainly iterates over the parts-of-speech, iterates over the synonym-sets (“concepts”) in each part-of-speech, collects the words associated with each synonym-net, and adds a new SynsetNode for each synonym-set setting a URI (unique identifier), the set of words, the gloss (short definition), and Wordnet identifier.

and code like this to create all our edges:

for(POS p : POS.values()) {
 
for( Iterator synsetIterator = _dict.getSynsetIterator(p); synsetIterator.hasNext(); ) {
 
ISynset key = synsetIterator.next();
 
String uri = synsetWords.get(key.getID())
 
for( Iterator<Entry<IPointer, List>> iterator2 = key.getRelatedMap().entrySet().iterator(); iterator2.hasNext(); ) {
 
Entry<IPointer, List> next2 = iterator2.next();
 
IPointer type = next2.getKey();
List l = next2.getValue();
 
for(ISynsetID id : l) {
 
String destURI = synsetWords.get(id);
  
Edge_hasWordnetPointer newEdge = cls.newInstance();
 
newEdge.URI = URIGenerator.generateURI("wordnet", cls)
 
newEdge.sourceURI = uri
 
newEdge.destinationURI = destURI 
 
writer.startBlock()
writer.writeGraphObject(newEdge)
writer.endBlock()
 
 
}
 
}
 
}
 
}

This iterates over the parts-of-speech, iterates over all the synsets, gets the set of relationships for each, and adds an Edge for each such relationship using Edges of specific type, like HyperNym and HypoNym.

With this we have all our Nodes and Edges written to a dataset file (see previous blog entries for our file “block” format).

We can then import the dataset file into local or remote Vital Service endpoint instance.

Next Post: https://vitalai.com/2014/04/29/building-a-data-visualization-plugin-with-the-vital-ai-development-kit/

2 thoughts on “Generating a Wordnet Dataset using Vital AI Development Kit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s