Running shell commands in Beaker Notebook

Data Science Notebooks like Beaker Notebook are a great way to not only explore and analyze data but also record the steps, so that the next Data Scientist can reproduce the results — just by clicking “Run”.

Few people like to spend time meticulously documenting their data analysis steps so to the degree that Data Science Notebooks can be “Self Documenting” — it greatly makes things a lot easier.

As Beaker Notebook can mix many programming languages within one Notebook like R, Python, and Groovy — most steps can be captured completely in a Notebook.

One case that is missing from Beaker however is the command line shell.  Bash is the default on Macintosh OSX, but the same is true for other shells, including the Windows shell or other Unix shells such as “csh” or “tcsh”.

Oftentimes shell commands are used for running data manipulation programs (like awk, perl, or sed), or running compiler processes (like maven or ant).

At Vital AI we use shell commands to run “vitalsigns” which compiles a data model into code, which is then used in data analysis, database queries, and machine learning processes (running inside Apache Spark).

It’s nice to run these within Beaker as not only a convenient way to avoid switching from Beaker to a terminal screen and back, but also to document these steps for reproducibility.

Fortunately with a little helper class it’s easy to run shell commands from Groovy cells in the Beaker Notebook.

The helper class is “RunBash.groovy” and is found on github here:
https://github.com/vital-ai/vital-data-utils/blob/master/src/main/groovy/ai/vital/data/utils/RunBash.groovy

Once a jar with this class is made available to Groovy via the Language Manager (see screenshot below), it can be used in a Groovy cell to run Bash scripts, like so:

import ai.vital.data.utils.RunBash

RunBash.enable() // hook .bash() to strings

//bash script begins here, in a Groovy multi-line string:
"""
echo \$VITAL_HOME

vitalsigns generate -o \${VITAL_HOME}/domain-ontology/vital-samples-0.1.0.owl -or
"""
.bash() // this runs the script</blockquote>

Here’s a screenshot of that running in Beaker:

runbash

Here’s a screenshot of the Language Manager (from the Notebook menu), with jars added to the Beaker Notebook classpath for Groovy.

beaker-languagemanager

 

Vital AI Cytoscape App in App Store

We recently published our Cytoscape App to the Cytoscape App Store.

Cytoscape is a wonderful graph visualization tool that is open-source, available on Desktops, and quite handy for graph analysis and visualization.

Our plugin allows Cytoscape to connect to databases, servers, and Apache Spark/Hadoop using the VitalService API.

The plugin is available directly in Cytoscape, or here: http://apps.cytoscape.org/apps/vitalaigraphvisualization

Cytoscape is available here: http://cytoscape.org/

Prior to using the plugin, the Vital AI software must be installed and configured.  The Vital AI software can be downloaded from here: http://vital.ai/#download

The Cytoscape plugin uses the VITAL_HOME environmental variable to find the Vital AI software and configuration files.

For those using Mac OSX, OSX needs some extra help for desktop applications like Cytoscape to use environmental variables.

Here is a good StackOverflow answer which helps Mac OSX use environmental variables: http://stackoverflow.com/a/32405815/2138426

The first tab in the Vital AI plugin enables selecting which VitalService endpoint to connect to.  These come from the VitalService configuration file found at:

$VITAL_HOME/vital-config/vitalservice/vitalservice.config

For Prime endpoints, an authorization key is used to connect.  For convenience, you can put such keys into your vitalsigns configuration file, like so:

config: {

local-key: key1-key1-key1  

}

The naming convention is: “vitalservicename”-key

So, the corresponding VitalService configuration entry would be:

profile.local {

type = VitalPrime

appID = analytics

VitalPrime {

endpointURL = “http://127.0.0.1:9081/java&#8221;

}

}

 

Now back to the plugin…

cytoscape-screenshot1

Here is the Connection tab.  Select the desired VitalService endpoint from the drop-down and hit “Connect”.

Now that we have connected, we can use the “Search” tab to search.

cytoscape-screenshot4

We can select which databases (“Segments”) to include, as well as what property to search.  In this case the “Wordnet” database is selected and the “name” property.

Let’s put all the results into a network.

cytoscape-screenshot5

 

Now let’s select them all and find what is connected to them.  This is called an “expansion” query and looks for everything connected to the starting node up to two hops (edges) away.

cytoscape-screenshot6

Starting the expansion…

cytoscape-screenshot7

Expanding all the selected nodes…  The “Paths” tab is used to select whether the expansion will be for one hop (one edge) or two hops (two edges), the direction of the desired edges (forward, backward, or both), and which “Segments” to include in the expansion.

cytoscape-screenshot8

And now we have some results!

cytoscape-screenshot9

Let’s zoom in on part of the network.  We can then further analyze the results, continue to explore and expand the network, or tune the visualization.

 

cytoscape-screenshot10

 

If we are connected to a Prime endpoint, we can use the Paths tab to select the node and edge types to filter with during an expansion query.

cytoscape-screenshot11

 

Also with a Prime endpoint, we can use the “DataScripts” tab to run datascripts on the server.  Datascripts can be used to analyze data, trigger Spark or Hadoop jobs, use a prediction model, or anything you like.

Please send along any comments or questions, and hope you enjoy using Cytoscape and our plugin to visualize your data.