Secure Data Store Solution

Secure Data Store Services (SDSS) records selected data from the EFPF Data Spine, so that data can be retrieved and analyzed at a later time, supporting advanced authorization schemes for sharing data access within the EFPF environment.

Architecture

SDSS Architecture

Deployment (Administrators)

Overview

  • SDSS API - The main API endpoint
  • Dependencies
    • Mongo DB
    • Influx DB

Ideally, to support High Availability, the dependant services should be deployed as clusters.

SDSS Config

Docker image: registry.fit.fraunhofer.de/efpf-pilots/secure-data-store-solution

Env Var Name Required Description
MONGO_URL Required Mongo DB connection URL. Example: mongodb://localhost:27017/ef-ds-1. Details for the format can be found at: https://docs.mongodb.com/manual/reference/connection-string/
INFLUX_HOST Required Hostname for InfluxDB
KC_CLIENT_ID Required App Client ID from Key Cloak
KC_CLIENT_SECRET Required App Secret from Key Cloak
KC_SERVER_URL Default https://efpf-security-portal.salzburgresearch.at/auth,
KC_REALM Default master
KC_RESOURCE Default secure-data-storage-api
SESSION_SECRET Required Used to secure sessions within SDSS.
PORT Default 8080 Port number to accept connections on.
SECURE Default: If port is not 8080 or 80 Use ‘true’ to force HTTPS
CERT_PASSPHRASE Not required Used as the passphrase for the HTTPS certificate
CERT_PATH Not required The path to the certificate file, if HTTPS is used. If not provided, a built in self-signed certificate is used. Note that this has security implications.
CERT_KEYPATH Not required The path to the private key file, if HTTPS is used. If not provided, a built in self-signed certificate is used. Note that this has security implications.

SDSS is accessible with HTTP(S)/REST. The UI can be found by requesting <hostname>:<port=8080>/ui in a web browser.

Sample command:

docker run -p 8080:8080 --env PORT=8080 --env MONGO_URL=mongodb://localhost:27017/ef-ds-1 --env INFLUX_HOST=localhost --env KC_CLIENT_ID=secure-data-storage-api --env KC_CLIENT_SECRET=XXXXXXXX   --env SESSION_SECRET=ChangeMe registry.fit.fraunhofer.de/efpf-pilots/secure-data-store-solution
Notes on MongoDB

SDSS was developed against MongoDB version 4.

A simple deployment with a single instance can be accomplished with the docker images mongodb:4, for which documentation can be found at https://hub.docker.com/_/mongo.

For access, SDSS typically connects to MongoDB’s default port 27017, but this can be configured through the congirued Mongo DB connection URL passed to SDSS in the environment variable MONGO_URL, which should also specify the host and database.

For data persistance, the path /data/db inside the Docker image should be mounted to the host filesystem.

Notes on InfluxDB

SDSS was developed against InfluxDB version 1.8.

A simple deployment with a single instance can be accompplished with the docker image influxdb:1.8, for which documentation can be found at https://hub.docker.com/_/influxdb.

For access, SDSS expects to connect to the Influx instance on port 8086, which is InfluxDB’s default port. The hostname is passed to SDSS in the environmental variable INFLUX_HOST.

For data persistance, the path /var/lib/influxdb inside the Docker image should be mounted to the host filesystem.

Sample test deployment on Windows (using a single combined Docker image)

Note that a proper value (a shared secret with an OAuth provider) for KC_CLIENT_SECRET is required. A custom secret value for SESSION_SECRET should be provided.

The exposed server starts after a 30 second delay, and will take some seconds to become responsive

## Run once (to configure Docker)
docker login registry.fit.fraunhofer.de
mkdir c:\test            # Ensure the directory data is to be stored in exists

## Create instance on port 8080
docker run --name sdss -v c:/test:/data -p 8080:8080 --env PORT=8080 --env KC_CLIENT_ID=secure-data-storage-api --env KC_CLIENT_SECRET=X__SECRET_HERE__X --env SESSION_SECRET=pickavalue registry.fit.fraunhofer.de/efpf-pilots/secure-data-store-solution:standalone

## Cleanup if needed
docker stop -t 60 sdss   # Will shutdown the container, giving 60 seconds to complete 
docker rm sdss           # Should only be done after the container is stopped
Sample test deployment on Windows (using Docker swarm with individual images for services)

Note that a proper value (a shared secret with an OAuth provider) for KC_CLIENT_SECRET is required. A custom secret value for SESSION_SECRET should be provided.

## Run once. Creates a swarm environment
docker login registry.fit.fraunhofer.de
docker swarm init
docker network create -d overlay vnet

## Windows commands to ensure paths exist
mkdir c:\test\mongo
mkdir c:\test\influx


## Cleanup if needed
docker service rm sdss
docker service rm influxdb
docker service rm mongodb

## Create docker services (with Windows paths)
docker service create --name mongodb --mount=type=bind,source=c:\test\influx,destination=/var/lib/influxdb  --network vnet mongo:4
docker service create --name influxdb --mount=type=bind,source=c:\test\mongo,destination=/data/db  --network vnet influxdb:1.8
docker service create --name sdss -p 8080:8080 --env PORT=8080 --env MONGO_URL=mongodb://mongodb:27017/ef-ds-1 --env INFLUX_HOST=influxdb --env KC_CLIENT_ID=secure-data-storage-api --env KC_CLIENT_SECRET=X__SECRET_HERE__X --env SESSION_SECRET=itsasecrettoeveryone --with-registry-auth --network vnet registry.fit.fraunhofer.de/efpf-pilots/secure-data-store-solution

See the console:

docker service logs -f sdss

User Documentation

Overview

The SDSS concept Timeseries Database corresponds to a collection of related data. An apt metaphor would be a “graph”, with a different collection of series and values mapped. In a common SQL database, this would correspond to a table. In Influx terminology, this is a measurement.

The actual timeseries data is stored using Influx DB, which specializes in processing data that is organized by time. G

SQL InfluxDB MongoDB
Database Database Database
Table Measurement Collection
Row Point Document
Column Field Field
Data Mapping

To access data within the incoming JSON message received from the MQTT server, a path needs to be given. This is provided in a string, as if it was to be evaluted by a JavaScript parser.

The provided path must refer to a scalar value (i.e. number, string, or boolean) within the message data.

To access the field value from the following sample data:

{
  outerData: {
    myArray: [
      {
        value: 42
      }
    ]
  }
}

the path outerData.myarray[0].value would be given, which would extract the value 42.

A string is given to specify where the value should be stored in the timestream data.

Directions

1. Create a Database

  1. Select Timeseries Database List from the menu
  2. Click on New to edit a new Database
  3. Specify a value for Name, e.g. “Main DB”, to identify the database.
  4. Click on Save to create/update the database definition.
  5. Select Timeseries Database List from the menu, to return to the listing. The newly created database should now be listed.

2. Configure a Broker Login

  1. Select Tsdb Mqtt Broker List from the menu
  2. Click on New to edit a new Broker Configuration
  3. Specify a value for Username, which is the username for the broker account
  4. Specify a value for Password, which is the password for the borker account
  5. Specify a value for URL, which specifies how to connect to the broker. For example, mqtt://:1883
    • Protocol, e.g., “mqtt”. Supported values: “mqtt”, “mqtts”, “tcp”, “tls”, “ws”, “wss”
    • Hostname - The hostname or IP address of the server to connect to.
    • Port for the server process. (Default for “mqtt” is 1883, and 8883 for “mqtts”)
  6. Click on Save to create/update the broker definition.
  7. Select Timeseries Database List from the mentu, to return to the listing. The newly created database should now be listed.

3. Create a Timeseries List

  1. Select Timeseries List from the menu
  2. Click on new to edit a new Timeseries
  3. Specify a value for Name
  4. Select a value for TSDB ID
  5. Select a value for MQTT Broker ID
  6. Specify a value for MQTT Topic
  7. Values for Timestamp Specification and Timestamp Path are covered in the next section.
  8. Click on Save to create/update the database definition
  9. Select Timeseries List from the menu, to return to the listing. The newly created timeseries should now be listed.

4. Specify field mappings.

The configuration between the data sent through MQTT needs to be mapped into the Timeseries database. The goal here is to specify the subject of the data using “Tags”, the instant in time with the “Timestamp”, and the invidual data samples with Fields. Details of the data model are given in the previous Overview section.

  1. In the listing for Timeseries, (by clicking on Timeseries List in the menu), click on Edit Fields.
  2. Timestamp
    1. Select a method for the timestamp specification. See Table: Timestamp Methods.
    2. Speciy a field in the JSON message for
  3. Tags
    1. Under the Tags section, click on New. The fields Field Type and Timeseries ID should be pre-filled.
    2. Specify a value for the JSON path, for where to find values in the incoming message data. See [#data-mapping] for further details.
    3. Click on Save.
    4. Use the browser’s back button to return to the tag/field overview. The newly created tag should now be listed.
  4. Fields
    1. Under the Fields section, click on New. The fields Field Type and Timeseries ID should be pre-filled.
    2. Specify a value for the JSON path, for where to find values in the incoming message data. See [#data-mapping] for further details.
    3. Click on Save.
    4. Use the browser’s back button to return to the tag/field overview. The newly created field should now be listed.

######t Table: Timestamp Methods

Method Description
UNIX_EPOCH_SECONDS Integer value denoting seconds since the Unix Epoch (00:00:00 UTC, January 1st, 1970)
UNIX_EPOCH_MILLISECONDS Ingeter value denoting milliseconds since the Unix Epoch (00:00:00 UTC, January 1st, 1970)
ISO_8601 See https://www.iso.org/standard/40874.html or https://www.w3.org/TR/NOTE-datetime-970915. Example formats: “2020-09-03T07:50:05Z”, “20200903T075005Z”, “20200903T095005+02:00”, “2020-09-03 09:50:05.000+02:00”. See https://momentjs.com/docs/#/parsing/string/ for details
RFC_2822 See https://tools.ietf.org/html/rfc2822#section-3.3. RFC 2822 is included for legacy applications, as this format uses langauge dependant abbreviations, and ISO 8601 provides clear defintions internationally. Examples: “Thu, 03 Sep 2020 09:50:05 +0200”, “9 Sep 20 07:50:05 UT”. See https://momentjs.com/docs/#/parsing/string/ for details.
MM_DD_YYYY_hh_mm_ss American date format. 6 numbers, seperated by non-digits. Examples: “3 9 2020 9 50 5”, “3/9/2020 9:50:5”, or even “3!9!2020!9!50!5”.
DD_MM_YYYY_hh_mm_ss European date format. 6 numbers, seperated by non-digits. Examples: “9 3 2020 9 50 5”, “9/3/2020 9:50:5”, or even “9!3!2020!9!50!5”.

5. Start data collection

  1. Click on Timeseries List in the menu.
  2. For the desired timeseries, click on Enable Subscription. If successful, the button will change to Disable Subscription.
    1. (Clicking Disable Subscription will stop the monitoring.)

6. Sample queries

  1. Click on Timeseries Database List.
  2. For the desired, database, click on Influx Query.
  3. Enter a query in the textbox. A sample query for the timeseries database is pregenerated, which selects the last 10 items.
    1. For details see [#influx-query]
  4. Click on Query to execute the query.
  5. The results are displayed under Response.

Influx Query

Details for the Influx Query Language can be found at https://docs.influxdata.com/influxdb/v1.8/query_language/.

The name of the Influx “Measurement” is the MongoDB ID for the Timeseries Database.

API (Developers)

Influx Query Execution

Request

GET /rest/Tsdb/<Timeseries ID>/influxQuery?query=<InfluxQL Command>

URL Param/Query

Name Value Description
Timeseries ID (Mongo ID) The timeseries ID
InfluxQL Command InfluxQL as string Influx Query Language command to execute.

Headers

Header Value Description
Authorization bearer <token> Gives the Keycloak token to identify and authenticate the data owner

Response

A JSON array of Influx Points, (specifying the timestamp, tags, and fields)

Sharing Data Access

Access to the primary resource, the timeseries database, can be further refined to specific time ranges, with a start and end time. (The ’s' denotes a resolution of seconds.)

In the future, authorizations for access to speficic time ranges will be configured.

/rest/Tsdb/:id/s/:start/:end

URL Param

Name Value Description
id (Mongo ID) The timeseries ID
start Integer, seconds since Unix Epoch Starting time of the resource
end Integer, seconds since Unix Epoch Ending time of the resource

Headers

Header Value Description
Authorization bearer <token> Gives the Keycloak token to authorize the accessor

Download CSV Resource.

Produces a CSV file of the data in the sub-resource. Note that this could be quite large.

Request

Shares format as per Sharing Data Access

GET /rest/Tsdb/:id/s/:start/:end/csv
Response

CSV Data. Columns correspond to the attributes in the Influx points returned, and the header is named correspondingly.

FAQ

1. What kind of data does Secure Data Storage Solution handle?

Secure Data Storage Solution is designed for time series data.

2. What does it take to configure a Secure Data Storage Solution?

In addition to a host system, Secure Data Storage Solution utilizes an OAuth provider, and requires the specificiation of a OAuath Client configuration. This is expected to be the EFPF Security Gateway

Previous
Next