Customise Applications

Although standard, unmodified containers can be deployed through Nuvla, you have seen that this is rather limiting because input parameters cannot be provided and information generated within the container (such as passwords) cannot be recovered.

Because of these limitations, most containers used from Nuvla have been customised. The good news is that customising images is easy, especially as Docker allows you to build directly from existing containers.

Contextualisation

Nuvla provides contextualisation information to deployed containers in two ways: through environmental variables and via the standard Docker API.

Information Passed Into Container

A minimal set of environmental variables are passed to all deployed containers. These are:

NUVLA_ENDPOINT: The endpoint of the Nuvla service that deployed the container.
NUVLA_DEPLOYMENT_ID: The deployment identifier of the deployment, taking the form of “deployment/uuid” where the “uuid” is a the full UUID of the deployment.
NUVLA_API_KEY: The API key of the API key/secret pair.
NUVLA_API_SECRET: The API secret of the API key/secret pair.

These environment variables allow the running container to access the server to update the deployment state (e.g. to provide output parameter values) and to access information contained in the deployment (e.g. detailed information about requested data objects).

NOTE: The API key/secret pair is a unique credential generated for each deployment. This allows the deployment to access the Nuvla service with the rights of the user. This credential is independent of the user’s other credentials and can be revoked at any time. The credential is always deleted when the deployment terminates.

Docker API

Nuvla extracts certain information from the deployment and passes this information to Docker via its API when starting the container.

An important class of parameters extracted from the Nuvla application definition is the set of ports exposed by the container. These ports are mapped dynamically by Docker to ephemeral ports. The actual port used can be discovered using the output parameters. For example, if the tcp port 80 is exposed by the container, the actual port used will be in the “tcp.80” output parameter. These parameters are useful when defining the URL of the deployed service.

The second class of information extracted from the Nuvla application definition relates to mounted volumes. When mount points are found in selected data-record resources, they are extracted and passed to Docker when starting the container. The associated data should appear automatically within the container after it has started.

Information Passed Out of Container

Very frequently a running container must pass some information back to the user. A container generating a random password or token for accessing services within a container is a typical example.

To do this, you can provide a list of output parameters when you define an application. Values for these output parameters can then be set by the container when it runs. Setting the values of output parameters can be done most easily via the Nuvla Python client API. See below for details.

Uploading Containers

Nuvla can use any image that has been uploaded to any Docker repository. By default, Nuvla expects Docker Hub. If you have an account on Docker Hub, you can create an organisation to hold your “repositories”, each of which holds multiple tagged versions of an image.

The process is straightforward:

Use the docker login command to log into the Docker Hub.
Build your image with docker build.
Tag for image with docker tag, providing a tag name.
Upload the image with docker push.

At this point, the image will be visible in the Docker Hub and can then be used from Nuvla.

If your Docker image is protected by a username/password, just create a Docker Registry credential and reference these credentials in the app. Nuvla will then use these credentials on behalf of the user when deploying the protected Docker images.

To use this API you must ensure that your customised image contains Python (2.7+ or 3.x). You can then install the Nuvla Python API with the command pip install nuvla-api.

Customised Jupyter Notebook

The Jupyter notebook demo contains the following customisations:

Installation of Python, pip, and the Nuvla Python API.
Accessing of the deployment resource to recover detailed information about referenced data-record resources.
Generating a token to protect the deployed notebook and passing this information back to the user through an output parameter.

We will show how each of these customisations are accomplished in the Dockerfile and in a Python script added to the container.

The Dockerfile used to generate the customised Jupyter Notebook is duplicated below:

FROM sixsq/gssc-sepp:latest

RUN apt-get update && apt-get install -y python python-pip

RUN pip install nuvla-api

ADD link-data.py /root/link-data.py
RUN chmod a+x /root/link-data.py

ADD start-service.sh /root/start-service.sh
RUN chmod a+x /root/start-service.sh

ENTRYPOINT ["/root/start-service.sh"]

This Dockerfile is short and straightforward. It:

Installs Python, pip, and the Nuvla Python API,
Adds a Python script (link-data.py) that organises the referenced data objects, and
Adds a script (start-service.sh) that replaces the entry point of the parent image.

All the detailed changes are encapsulated in the two included scripts.

The simpler of the two scripts is start-service.sh. The contents are shown below:

#!/bin/bash -xe

# create links for requested data objects
/root/link-data.py

token=$(cat /root/token.txt)

# start service in the foreground with logging to console
cd /gssc
jupyter lab --ip=0.0.0.0 --allow-root --no-browser --NotebookApp.token=$token

This script calls the link-data.py script, recovers the generated token to access the notebook, and then starts Jupyter Notebook using this token.

The slightly more complicated script link-data.py is duplicated below:

#!/usr/bin/env python

import sys
import os
import random
import string
import uuid
from nuvla.api import Api

deployment_params_filter="deployment/href='{}' and name='{}'"

#
# Read the configuration from the environment.
#

endpoint = os.getenv('NUVLA_ENDPOINT')
api_key = os.getenv('NUVLA_API_KEY')
api_secret = os.getenv('NUVLA_API_SECRET')
deployment_id = os.getenv('NUVLA_DEPLOYMENT_ID')

#
# Ensure complete environment or bail out.
#

if (endpoint is None or
    api_key is None or
    api_secret is None or
    deployment_id is None):
  print("missing required configuration information; skipping data link configuration...")
  sys.exit()

#
# Setup the Nuvla API.
#

api = Api(endpoint=endpoint, insecure=True)
api.login_apikey(api_key, api_secret)

# Recover deployment information. 

deployment = api.get(deployment_id)

try:
  data_records = deployment.data['data-records']
except KeyError:
  data_records = {}
  
#
# setup directories for object links
#

buckets_base_path = '/mnt/'
if not os.path.exists(buckets_base_path):
  os.makedirs(buckets_base_path)

data_path='/gssc/data/nuvla/'
if not os.path.exists(data_path):
  os.makedirs(data_path)

#
# mount the buckets containing the requested objects
#

for dr in data_records.keys():
  dr_doc = api.get(dr)
  dr_bucket = dr_doc.data['data:bucket']
  dr_object = dr_doc.data['data:object']

  dr_mission = dr_doc.data['gnss:mission']

  bucket_mount_point = buckets_base_path + dr_bucket
  
  if not os.path.exists(bucket_mount_point):
    os.makedirs(bucket_mount_point)

  data_directory = data_path + dr_mission
  
  if not os.path.exists(data_directory):
    os.makedirs(data_directory)

  os.system('ln -s {0}/{1} {4}{3}/{2}__{1}'.format(bucket_mount_point, dr_object, dr_bucket, dr_mission, data_path))

#
# generate token for jupyter
#

token=''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(24))

token_file = open('/root/token.txt', 'w')
token_file.write(token)
token_file.close()

def from_data_uuid(text):
    class NullNameSpace:
        bytes = b''

    return str(uuid.uuid3(NullNameSpace, text))

def deployment_param_href(deployment_id, node_id, param_name):
        param_id = ':'.join(item or '' for item in [deployment_id, node_id, param_name])
        return 'deployment-parameter/' + from_data_uuid(param_id)

param_id = deployment_param_href(deployment_id, deployment_id.split('/')[1], 'jupyter-token')

api.edit(param_id, {'value': token})

The first part of the script simply includes the Nuvla Python API and a few other dependencies, ensures that the expected environmental variables are set, and then initialises the Nuvla Python client. This is standard boilerplate code that can be duplicated for any customised image.

The second section reads the data object metadata and provides links to the directory where the Jupyter Notebook expects to find files for analysis. Additionally, the code organises the links into a hierarchy based on the metadata; this organisation is a more logical organisation of these files for users in this scientific domain. Other domains will have their own metadata and will want to organise their files differently.

The last section generates a random access token, saves this in a file for the start-service.sh script, and then writes this value to the jupyter-token output parameter. This pattern can be repeated for any output parameters that you define for your containers.

Hands On Exercises

Choose an image to customise via output parameters and/or access to data managed through Nuvla. Use the Jupyter Notebook example as a guideline for performing this customisation.

You might consider using one of the following images:

nginx: Mount data selected through Nuvla into the web server’s root directory.
rocker/rstudio-stable: Generate a random password for the image and publish it via an output parameter.

In all cases, you’ll need to perform the following tasks:

Create a Dockerfile to incorporate your changes into a new image.
Build the container and test the image locally.
Upload the image to a public repository.
Create an app within Nuvla that references your modified container, adding output parameter definitions when appropriate.
Launch the application through Nuvla and verify that the service behaves as expected.

If the container doesn’t start as you expect, you may need to access the logs from Docker Swarm directly to help with the debugging.