Data Science Notebooks like Beaker Notebook are a great way to not only explore and analyze data but also record the steps, so that the next Data Scientist can reproduce the results — just by clicking “Run”.
Few people like to spend time meticulously documenting their data analysis steps so to the degree that Data Science Notebooks can be “Self Documenting” — it greatly makes things a lot easier.
As Beaker Notebook can mix many programming languages within one Notebook like R, Python, and Groovy — most steps can be captured completely in a Notebook.
One case that is missing from Beaker however is the command line shell. Bash is the default on Macintosh OSX, but the same is true for other shells, including the Windows shell or other Unix shells such as “csh” or “tcsh”.
Oftentimes shell commands are used for running data manipulation programs (like awk, perl, or sed), or running compiler processes (like maven or ant).
At Vital AI we use shell commands to run “vitalsigns” which compiles a data model into code, which is then used in data analysis, database queries, and machine learning processes (running inside Apache Spark).
It’s nice to run these within Beaker as not only a convenient way to avoid switching from Beaker to a terminal screen and back, but also to document these steps for reproducibility.
Fortunately with a little helper class it’s easy to run shell commands from Groovy cells in the Beaker Notebook.
The helper class is “RunBash.groovy” and is found on github here:
https://github.com/vital-ai/vital-data-utils/blob/master/src/main/groovy/ai/vital/data/utils/RunBash.groovy
Once a jar with this class is made available to Groovy via the Language Manager (see screenshot below), it can be used in a Groovy cell to run Bash scripts, like so:
import ai.vital.data.utils.RunBash RunBash.enable() // hook .bash() to strings //bash script begins here, in a Groovy multi-line string: """ echo \$VITAL_HOME vitalsigns generate -o \${VITAL_HOME}/domain-ontology/vital-samples-0.1.0.owl -or """ .bash() // this runs the script</blockquote>
Here’s a screenshot of that running in Beaker:
Here’s a screenshot of the Language Manager (from the Notebook menu), with jars added to the Beaker Notebook classpath for Groovy.