Running shell commands in Beaker Notebook

Data Science Notebooks like Beaker Notebook are a great way to not only explore and analyze data but also record the steps, so that the next Data Scientist can reproduce the results — just by clicking “Run”.

Few people like to spend time meticulously documenting their data analysis steps so to the degree that Data Science Notebooks can be “Self Documenting” — it greatly makes things a lot easier.

As Beaker Notebook can mix many programming languages within one Notebook like R, Python, and Groovy — most steps can be captured completely in a Notebook.

One case that is missing from Beaker however is the command line shell.  Bash is the default on Macintosh OSX, but the same is true for other shells, including the Windows shell or other Unix shells such as “csh” or “tcsh”.

Oftentimes shell commands are used for running data manipulation programs (like awk, perl, or sed), or running compiler processes (like maven or ant).

At Vital AI we use shell commands to run “vitalsigns” which compiles a data model into code, which is then used in data analysis, database queries, and machine learning processes (running inside Apache Spark).

It’s nice to run these within Beaker as not only a convenient way to avoid switching from Beaker to a terminal screen and back, but also to document these steps for reproducibility.

Fortunately with a little helper class it’s easy to run shell commands from Groovy cells in the Beaker Notebook.

The helper class is “RunBash.groovy” and is found on github here:

Once a jar with this class is made available to Groovy via the Language Manager (see screenshot below), it can be used in a Groovy cell to run Bash scripts, like so:


RunBash.enable() // hook .bash() to strings

//bash script begins here, in a Groovy multi-line string:

vitalsigns generate -o \${VITAL_HOME}/domain-ontology/vital-samples-0.1.0.owl -or
.bash() // this runs the script</blockquote>

Here’s a screenshot of that running in Beaker:


Here’s a screenshot of the Language Manager (from the Notebook menu), with jars added to the Beaker Notebook classpath for Groovy.



Vital AI Cytoscape App in App Store

We recently published our Cytoscape App to the Cytoscape App Store.

Cytoscape is a wonderful graph visualization tool that is open-source, available on Desktops, and quite handy for graph analysis and visualization.

Our plugin allows Cytoscape to connect to databases, servers, and Apache Spark/Hadoop using the VitalService API.

The plugin is available directly in Cytoscape, or here:

Cytoscape is available here:

Prior to using the plugin, the Vital AI software must be installed and configured.  The Vital AI software can be downloaded from here:

The Cytoscape plugin uses the VITAL_HOME environmental variable to find the Vital AI software and configuration files.

For those using Mac OSX, OSX needs some extra help for desktop applications like Cytoscape to use environmental variables.

Here is a good StackOverflow answer which helps Mac OSX use environmental variables:

The first tab in the Vital AI plugin enables selecting which VitalService endpoint to connect to.  These come from the VitalService configuration file found at:


For Prime endpoints, an authorization key is used to connect.  For convenience, you can put such keys into your vitalsigns configuration file, like so:

config: {

local-key: key1-key1-key1  


The naming convention is: “vitalservicename”-key

So, the corresponding VitalService configuration entry would be:

profile.local {

type = VitalPrime

appID = analytics

VitalPrime {

endpointURL = “;




Now back to the plugin…


Here is the Connection tab.  Select the desired VitalService endpoint from the drop-down and hit “Connect”.

Now that we have connected, we can use the “Search” tab to search.


We can select which databases (“Segments”) to include, as well as what property to search.  In this case the “Wordnet” database is selected and the “name” property.

Let’s put all the results into a network.



Now let’s select them all and find what is connected to them.  This is called an “expansion” query and looks for everything connected to the starting node up to two hops (edges) away.


Starting the expansion…


Expanding all the selected nodes…  The “Paths” tab is used to select whether the expansion will be for one hop (one edge) or two hops (two edges), the direction of the desired edges (forward, backward, or both), and which “Segments” to include in the expansion.


And now we have some results!


Let’s zoom in on part of the network.  We can then further analyze the results, continue to explore and expand the network, or tune the visualization.




If we are connected to a Prime endpoint, we can use the Paths tab to select the node and edge types to filter with during an expansion query.



Also with a Prime endpoint, we can use the “DataScripts” tab to run datascripts on the server.  Datascripts can be used to analyze data, trigger Spark or Hadoop jobs, use a prediction model, or anything you like.

Please send along any comments or questions, and hope you enjoy using Cytoscape and our plugin to visualize your data.