Managing Data-Object Resources with the API
The following example shows how to create and populate a data-object
resource and the associated S3 object.
NOTE: Not all imports are listed in the example and you must provide the correct endpoint and credentials. Also there are some variables set that correspond to external information that must be provided.
import hashlib
import json
import random
import requests
import string
from os import listdir, environ
from os.path import isfile, join
from nuvla.api import Api as nuvla_Api
nuvla_api = nuvla_Api(environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_internal('your-username', 'your-password')
bucket = 'new-bucket-for-tests'
object = 'new-object-for-tests'
#
# function to create a file with random contents
#
def random_file(size):
chars = ''.join([random.choice(string.lowercase) for i in range(size)])
filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
with open(filename, 'w') as f:
f.write(chars)
return filename
file_size = 1024
filename = random_file(file_size)
#
# create a data-object
#
data = {"name": "data-object-1",
"description": "data object 1 with random data",
"template": {
"credential": s3_credential_id,
"type": "generic",
"resource-type": "data-object-template",
"content-type": "text/plain",
"object": object,
"bucket": bucket,
"href": "data-object-template/generic"
}
}
response = nuvla_api.add('data-object', data)
data_object_id = response.data['resource-id']
print("data-object id: %s\n" % data_object_id)
#
# upload the file contents
#
print("UPLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "upload")
upload_url = response.data['uri']
print("upload_url: %s\n" % upload_url)
body = open(filename, 'rb').read()
headers = {"content-type": "text/plain"}
response = requests.put(upload_url, data=body, headers=headers)
print(response)
#
# mark the object as ready
#
print("READY ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "ready")
print(response)
#
# download the file
#
print("DOWNLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "download")
download_url = response.data['uri']
print("download_url: %s\n" % download_url)
response = requests.get(download_url, headers=headers)
from pprint import pprint
pprint(response)
print(response.text)
Managing Data-Record Resources with the API
To create a data-record
resource via the Python API, use code
similar to the following example. You must provide the correct
endpoint, username, and password for your Nuvla server.
NOTE: Not all imports are listed in the example and you must provide the correct endpoint and credentials. Also there are some variables set that correspond to external information that must be provided. The contents of the
data-record
will depend on your use case.
from nuvla.api import Api as nuvla_Api
nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_internal('your-username', 'your-password')
current_date = '%sZ' % datetime.utcnow().replace(microsecond=0).isoformat()
data = {
"infrastructure-service": swarm_id,
"name": "data-object-1",
"description": "data-object-1 description",
"resource:type": "DATA",
"resource:protocol": "NFS",
"resource:object": data_object_id,
"data:bucket": bucket,
"data:object": object,
"data:contentType": "text/plain",
"data:timestamp": current_date,
"data:bytes": file_size,
"data:nfsDevice": "/nfs-root",
"data:nfsIP": environ['INFRA_IP'],
"data:protocols": [
"tcp+nfs"
],
"gnss:mission": "random",
"acl": {
"owner": {
"type": "ROLE",
"principal": "ADMIN"
},
"rules": [
{
"right": "VIEW",
"type": "ROLE",
"principal": "USER"
},
{
"type": "ROLE",
"principal": "ADMIN",
"right": "ALL"
}
]
}
}
response = nuvla_api.add('data-record', data)
data_record_id = response.data['resource-id']
print("data-record id: %s\n" % data_record_id)
Managing Data-Set Resources with the API
To create a data-set
resource via the Python API, use code similar
to the following example. You must provide the correct endpoint,
username, and password for your Nuvla server.
NOTE: Not all imports are listed in the example.
from nuvla.api import Api as nuvla_Api
nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_internal('your-username', 'your-password')
data_set = {"name": "GREAT (CLK)",
"description": "GREAT (CLK) data at ESA",
"module-filter": "data-accept-content-types='application/x-clk'",
"data-record-filter": "gnss:mission='great' and data:contentType='application/x-clk'"}
data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)
Demonstration: Managing data
Data Objects/Records
The full lifecycle of a data object consists of the following steps:
-
Create a new
data-object
resource in Nuvla, -
Use the “upload” action to obtain an upload URL for the data,
-
Upload the data using the URL,
-
Mark the
data-object
as “ready” and read-only via the “ready” action, -
Use the “download” action to obtain a download URL,
-
Download the data using the URL,
-
Delete the
data-object
to remove the Nuvla resource and the backing S3 object.
All these actions can be completed with either the Python API or
directly with the REST API, for example, with curl
.
In parallel, optional data-record
resources can also be
created. These resources associate enhanced, domain-specific metadata
with the data objects. POSIX access information can be provided via
data-record
resources on infrastructures that expose data objects
via S3 and via POSIX.
The following script describes the full data object workflow via the Nuvla Python API.
#!/usr/bin/env python
#
# Creates a data-object resource and associated data-record.
#
# The following environmenal variables can/must be defined:
#
# NUVLA_ENDPOINT: endpoint of Nuvla server, defaults to localhost
# NUVLA_USERNAME: username to access Nuvla
# NUVLA_PASSWORD: password to access Nuvla
#
# NUVLA_DATA_BUCKET: name of S3 bucket
# NUVLA_DATA_OBJECT: name of S3 object
#
# SWARM_NFS_IP: IP address of NFS server on Swarm cluster
#
from datetime import datetime
import hashlib
import random
import requests
import string
#import logging
#logging.basicConfig(level=logging.DEBUG)
from os import listdir, environ, remove
from nuvla.api import Api as nuvla_Api
nuvla_api = nuvla_Api(environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_password(environ['NUVLA_USERNAME'], environ['NUVLA_PASSWORD'])
bucket = environ['NUVLA_DATA_BUCKET']
object = environ['NUVLA_DATA_OBJECT']
#
# get the s3 infrastructure-service
#
response = nuvla_api.search('infrastructure-service', filter="subtype='s3'")
s3_service = response.data['resources'][0]
s3_id = s3_service['id']
s3_endpoint = s3_service['endpoint']
print('S3 ID: %s' % s3_id)
print('S3 ENDPOINT: %s' % s3_endpoint)
#
# get the credential for s3
#
response = nuvla_api.search('credential', filter="parent='%s'" % s3_id)
s3_credential = response.data['resources'][0]
s3_credential_id = s3_credential['id']
print('CREDENTIAL ID: %s' % s3_credential_id)
print(s3_credential)
#
# get the swarm infrastructure-service
#
response = nuvla_api.search('infrastructure-service', filter="type='swarm'")
swarm_service = response.data['resources'][0]
swarm_id = swarm_service['id']
swarm_endpoint = swarm_service['endpoint']
print('SWARM ID: %s' % swarm_id)
print('SWARM ENDPOINT: %s' % swarm_endpoint)
#
# function to create a file with random contents
# (text is lowercase characters, "binary" is uppercase characters)
#
def random_text_file(size):
chars = ''.join([random.choice(string.lowercase) for i in range(size)])
filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
with open(filename, 'w') as f:
f.write(chars)
return filename
def random_binary_file(size):
chars = ''.join([random.choice(string.uppercase) for i in range(size)])
filename = "%s.txt" % hashlib.sha1(chars).hexdigest()
with open(filename, 'w') as f:
f.write(chars)
return filename
#
# Create a timestamp to associate with the data
#
timestamp = '%s.00Z' % datetime.utcnow().replace(microsecond=0).isoformat()
location_geneva = [6.143158, 46.204391, 373.0]
location_lyon = [4.835659, 45.764043, 197.0]
#
# create a data-object
#
data = {"name": "data-object-1",
"description": "data object 1 with random data",
"template": {
"href": "data-object-template/generic",
"type": "generic",
"resource-type": "data-object-template",
"credential": s3_credential_id,
"timestamp": timestamp,
"location": location_geneva,
# "content-type": "application/octet-stream",
"content-type": "text/plain",
"bucket": bucket,
"object": object
}
}
print(data)
response = nuvla_api.add('data-object', data)
data_object_id = response.data['resource-id']
print("data-object id: %s\n" % data_object_id)
#
# upload the file contents
#
print("UPLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "upload")
upload_url = response.data['uri']
print("upload_url: %s\n" % upload_url)
file_size = random.randint(1, 8096)
filename = random_text_file(file_size)
body = open(filename, 'rb').read()
headers = {"content-type": "text/plain"}
response = requests.put(upload_url, data=body, headers=headers)
print(response)
remove(filename)
#
# mark the object as ready
#
print("READY ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "ready")
print(response)
#
# download the file
#
print("DOWNLOAD ACTION")
data_object = nuvla_api.get(data_object_id)
response = nuvla_api.operation(data_object, "download")
download_url = response.data['uri']
print("download_url: %s\n" % download_url)
response = requests.get(download_url, headers=headers)
from pprint import pprint
pprint(response)
print(response.text)
#
# create data-record
#
# FIXME: This should point to S3 service rather than SWARM.
data = {
"infrastructure-service": swarm_id,
"name": object,
"description": "data-object-1 description",
"content-type": "text/plain",
"timestamp": timestamp,
"location": location_geneva,
"bytes": file_size,
"mount": {"mount-type": "volume",
"target": '/mnt/%s' % bucket,
"volume-options": {"type": "nfs",
"o": 'addr=%s' % environ['SWARM_NFS_IP'],
"device": ':/nfs-root/%s' % bucket}},
"gnss:mission": "random",
"acl": {
"owners": ["group/nuvla-admin"],
"view-acl": ["group/nuvla-user"]
}
}
response = nuvla_api.add('data-record', data)
data_record_id = response.data['resource-id']
print("data-record id: %s\n" % data_record_id)
NOTE: The process of creating an object will also create the S3 bucket if it doesn’t exist. Similarly, the S3 bucket will be removed if the last object is removed from it.
NOTE: The visibility of the data objects and records is determined by the ACL. Change the ACL to share data with other users.
Data Sets
Once you have more than a few data objects, working with them individually becomes tedious. Instead you would usually group those objects (and records) into data sets.
Nuvla has a data-set
resource exactly for this purpose. Via the
standard filtering syntax, you can create dynamic definitions of data
sets.
The following script shows how this can be accomplished.
import os
#
# Creates the services and credenials needed for the ESA Swarm/Minio
# infrastructure for GNSS.
#
# The following environmenal variables can/must be defined:
#
# NUVLA_ENDPOINT: endpoint of Nuvla server, defaults to localhost
# NUVLA_USERNAME: username to access Nuvla
# NUVLA_PASSWORD: password to access Nuvla
#
from nuvla.api import Api as nuvla_Api
nuvla_api = nuvla_Api(os.environ['NUVLA_ENDPOINT'], insecure=True)
nuvla_api.login_password(os.environ['NUVLA_USERNAME'], os.environ['NUVLA_PASSWORD'])
#
# Add dataset definitions.
#
data_set = {"name": "Random Text",
"description": "Collection of files containing random text",
"module-filter": "data-accept-content-types='text/plain'",
"data-record-filter": "gnss:mission='random' and content-type='text/plain'"}
data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)
data_set = {"name": "Random Binary",
"description": "Collection of files containing random binary data",
"module-filter": "data-accept-content-types='application/octet-stream'",
"data-record-filter": "gnss:mission='random' and content-type='application/octet-stream'"}
data_set_response = nuvla_api.add('data-set', data_set)
data_set_id = data_set_response.data['resource-id']
print("data-set id: %s\n" % data_set_id)
Once the data sets are created, they should be visible in the “data” section of the Nuvla UI. Check that they are visible and that they contain the objects that you expect.
NOTE: The visibility of the data sets is determined by the ACL. Change the ACL to share data with other users.