Managing Data-Object Resources with the API

The following example shows how to create and populate a data-object resource and the associated S3 object.

NOTE: Not all imports are listed in the example and you must provide the correct endpoint and credentials. Also there are some variables set that correspond to external information that must be provided.

import hashlib
import json
import random
import requests
import string

from os import listdir, environ
from os.path import isfile, join

from nuvla.api import Api as nuvla_Api

nuvla_api = nuvla_Api(environ['NUVLA_ENDPOINT'], insecure=True)

nuvla_api.login_internal('your-username', 'your-password')

bucket = 'new-bucket-for-tests'
object = 'new-object-for-tests'

#
# function to create a file with random contents
#

def random_file(size):
    chars = ''.join([random.choice(string.lowercase) for i in range(size)])
    filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
    with open(filename, 'w') as f:
        f.write(chars)
    return filename

file_size = 1024
filename = random_file(file_size)

#
# create a data-object
#

data = {"name": "data-object-1",
        "description": "data object 1 with random data",
        "template": {
            "credential": s3_credential_id,
            "type": "generic",
            "resource-type": "data-object-template",
            "content-type": "text/plain",
            "object": object,
            "bucket": bucket,
            "href": "data-object-template/generic"
        }
}

response = nuvla_api.add('data-object', data)
data_object_id = response.data['resource-id']
print("data-object id: %s\n" % data_object_id)

#
# upload the file contents
#

print("UPLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "upload")
upload_url = response.data['uri']
print("upload_url: %s\n" % upload_url)

body = open(filename, 'rb').read()
headers = {"content-type": "text/plain"}
response = requests.put(upload_url, data=body, headers=headers)
print(response)

#
# mark the object as ready
#

print("READY ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "ready")
print(response)

#
# download the file
#

print("DOWNLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "download")
download_url = response.data['uri']
print("download_url: %s\n" % download_url)

response = requests.get(download_url, headers=headers)
from pprint import pprint
pprint(response)
print(response.text)

Managing Data-Record Resources with the API

To create a data-record resource via the Python API, use code similar to the following example. You must provide the correct endpoint, username, and password for your Nuvla server.

NOTE: Not all imports are listed in the example and you must provide the correct endpoint and credentials. Also there are some variables set that correspond to external information that must be provided. The contents of the data-record will depend on your use case.

from nuvla.api import Api as nuvla_Api

nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_internal('your-username', 'your-password')

current_date = '%sZ' % datetime.utcnow().replace(microsecond=0).isoformat()

data = {
    "infrastructure-service": swarm_id,
    
    "name": "data-object-1",
    "description": "data-object-1 description",
    
    "resource:type": "DATA",
    "resource:protocol": "NFS",
    "resource:object": data_object_id,
    
    "data:bucket": bucket,
    "data:object": object,
    "data:contentType": "text/plain",
    "data:timestamp": current_date,

    "data:bytes": file_size,
    
    "data:nfsDevice": "/nfs-root",
    "data:nfsIP": environ['INFRA_IP'],
    
    "data:protocols": [
        "tcp+nfs"
    ],

    "gnss:mission": "random",
    
    "acl": {
        "owner": {
            "type": "ROLE",
            "principal": "ADMIN"
        },
        "rules": [
            {
                "right": "VIEW",
                "type": "ROLE",
                "principal": "USER"
            },
            {
                "type": "ROLE",
                "principal": "ADMIN",
                "right": "ALL"
            }
        ]
    }    
}

response = nuvla_api.add('data-record', data)
data_record_id = response.data['resource-id']
print("data-record id: %s\n" % data_record_id)

Managing Data-Set Resources with the API

To create a data-set resource via the Python API, use code similar to the following example. You must provide the correct endpoint, username, and password for your Nuvla server.

NOTE: Not all imports are listed in the example.

from nuvla.api import Api as nuvla_Api

nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_internal('your-username', 'your-password')

data_set = {"name": "GREAT (CLK)",
            "description": "GREAT (CLK) data at ESA",
            "module-filter": "data-accept-content-types='application/x-clk'",
            "data-record-filter": "gnss:mission='great' and data:contentType='application/x-clk'"}

data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)

Demonstration: Managing data

Data Objects/Records

The full lifecycle of a data object consists of the following steps:

Create a new data-object resource in Nuvla,
Use the “upload” action to obtain an upload URL for the data,
Upload the data using the URL,
Mark the data-object as “ready” and read-only via the “ready” action,
Use the “download” action to obtain a download URL,
Download the data using the URL,
Delete the data-object to remove the Nuvla resource and the backing S3 object.

All these actions can be completed with either the Python API or directly with the REST API, for example, with curl.

In parallel, optional data-record resources can also be created. These resources associate enhanced, domain-specific metadata with the data objects. POSIX access information can be provided via data-record resources on infrastructures that expose data objects via S3 and via POSIX.

The following script describes the full data object workflow via the Nuvla Python API.

#!/usr/bin/env python

#
# Creates a data-object resource and associated data-record.
#
# The following environmenal variables can/must be defined:
#
# NUVLA_ENDPOINT: endpoint of Nuvla server, defaults to localhost
# NUVLA_USERNAME: username to access Nuvla
# NUVLA_PASSWORD: password to access Nuvla
#
# NUVLA_DATA_BUCKET: name of S3 bucket
# NUVLA_DATA_OBJECT: name of S3 object
#
# SWARM_NFS_IP: IP address of NFS server on Swarm cluster
#

from datetime import datetime
import hashlib
import random
import requests
import string

#import logging
#logging.basicConfig(level=logging.DEBUG)


from os import listdir, environ, remove

from nuvla.api import Api as nuvla_Api

nuvla_api = nuvla_Api(environ['NUVLA_ENDPOINT'], insecure=True)

nuvla_api.login_password(environ['NUVLA_USERNAME'], environ['NUVLA_PASSWORD'])

bucket = environ['NUVLA_DATA_BUCKET']
object = environ['NUVLA_DATA_OBJECT']

#
# get the s3 infrastructure-service
#

response = nuvla_api.search('infrastructure-service', filter="subtype='s3'")
s3_service = response.data['resources'][0]
s3_id = s3_service['id']
s3_endpoint = s3_service['endpoint']

print('S3 ID: %s' % s3_id)
print('S3 ENDPOINT: %s' % s3_endpoint)

#
# get the credential for s3
#

response = nuvla_api.search('credential', filter="parent='%s'" % s3_id)
s3_credential = response.data['resources'][0]
s3_credential_id = s3_credential['id']

print('CREDENTIAL ID: %s' % s3_credential_id)
print(s3_credential)

#
# get the swarm infrastructure-service
#

response = nuvla_api.search('infrastructure-service', filter="type='swarm'")
swarm_service = response.data['resources'][0]
swarm_id = swarm_service['id']
swarm_endpoint = swarm_service['endpoint']

print('SWARM ID: %s' % swarm_id)
print('SWARM ENDPOINT: %s' % swarm_endpoint)

#
# function to create a file with random contents
# (text is lowercase characters, "binary" is uppercase characters)
#

def random_text_file(size):
    chars = ''.join([random.choice(string.lowercase) for i in range(size)])
    filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
    with open(filename, 'w') as f:
        f.write(chars)
    return filename

def random_binary_file(size):
    chars = ''.join([random.choice(string.uppercase) for i in range(size)])
    filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
    with open(filename, 'w') as f:
        f.write(chars)
    return filename

#
# Create a timestamp to associate with the data
#

timestamp = '%s.00Z' % datetime.utcnow().replace(microsecond=0).isoformat()

location_geneva = [6.143158, 46.204391, 373.0]
location_lyon = [4.835659, 45.764043, 197.0]

#
# create a data-object
#

data = {"name": "data-object-1",
        "description": "data object 1 with random data",
        "template": {
            "href": "data-object-template/generic",
            "type": "generic",
            "resource-type": "data-object-template",
            "credential": s3_credential_id,
            "timestamp": timestamp,
            "location": location_geneva,
#            "content-type": "application/octet-stream",
            "content-type": "text/plain",
            "bucket": bucket,
            "object": object
        }
}

print(data)

response = nuvla_api.add('data-object', data)
data_object_id = response.data['resource-id']
print("data-object id: %s\n" % data_object_id)

#
# upload the file contents
#

print("UPLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "upload")
upload_url = response.data['uri']
print("upload_url: %s\n" % upload_url)

file_size = random.randint(1, 8096)
filename = random_text_file(file_size)

body = open(filename, 'rb').read()
headers = {"content-type": "text/plain"}
response = requests.put(upload_url, data=body, headers=headers)
print(response)

remove(filename)

#
# mark the object as ready
#

print("READY ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "ready")
print(response)

#
# download the file
#

print("DOWNLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "download")
download_url = response.data['uri']
print("download_url: %s\n" % download_url)

response = requests.get(download_url, headers=headers)
from pprint import pprint
pprint(response)
print(response.text)

#
# create data-record
#

# FIXME: This should point to S3 service rather than SWARM.

data = {
    "infrastructure-service": swarm_id,
  
    "name": object,
    "description": "data-object-1 description",

    "content-type": "text/plain",
    "timestamp": timestamp,
    "location": location_geneva,

    "bytes": file_size,

    "mount": {"mount-type": "volume",
              "target": '/mnt/%s' % bucket,
              "volume-options": {"type": "nfs",
                                 "o": 'addr=%s' % environ['SWARM_NFS_IP'],
                                 "device": ':/nfs-root/%s' % bucket}},
            
    "gnss:mission": "random",
  
    "acl": {
        "owners": ["group/nuvla-admin"],
        "view-acl": ["group/nuvla-user"]
    }
}

response = nuvla_api.add('data-record', data)
data_record_id = response.data['resource-id']
print("data-record id: %s\n" % data_record_id)

NOTE: The process of creating an object will also create the S3 bucket if it doesn’t exist. Similarly, the S3 bucket will be removed if the last object is removed from it.

NOTE: The visibility of the data objects and records is determined by the ACL. Change the ACL to share data with other users.

Data Sets

Once you have more than a few data objects, working with them individually becomes tedious. Instead you would usually group those objects (and records) into data sets.

Nuvla has a data-set resource exactly for this purpose. Via the standard filtering syntax, you can create dynamic definitions of data sets.

The following script shows how this can be accomplished.

import os

#
# Creates the services and credenials needed for the ESA Swarm/Minio
# infrastructure for GNSS.
#
# The following environmenal variables can/must be defined:
#
# NUVLA_ENDPOINT: endpoint of Nuvla server, defaults to localhost
# NUVLA_USERNAME: username to access Nuvla
# NUVLA_PASSWORD: password to access Nuvla
#

from nuvla.api import Api as nuvla_Api

nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)

nuvla_api.login_password(os.environ['NUVLA_USERNAME'], os.environ['NUVLA_PASSWORD'])

#
# Add dataset definitions.
#

data_set = {"name": "Random Text",
            "description": "Collection of files containing random text",
            "module-filter": "data-accept-content-types='text/plain'",
            "data-record-filter": "gnss:mission='random' and content-type='text/plain'"}

data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)

data_set = {"name": "Random Binary",
            "description": "Collection of files containing random binary data",
            "module-filter": "data-accept-content-types='application/octet-stream'",
            "data-record-filter": "gnss:mission='random' and content-type='application/octet-stream'"}

data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)

Once the data sets are created, they should be visible in the “data” section of the Nuvla UI. Check that they are visible and that they contain the objects that you expect.

NOTE: The visibility of the data sets is determined by the ACL. Change the ACL to share data with other users.