Quickstart Guide

Anomaly Detection Quickstart Guide

Accessing from the EFPF portal

Before you are able to access the anomaly detection tool your efpf account will need to be assigned to one or more groups. To do this please send an email to “ jacob.griffiths@informationcatalyst.com” in this email please specify either the name of the group you need to be added to or the name of your organisation for creation of a new group. Then include the email address of any users who need to be added to the specified group.

Open a browser and go to the below UR and login to access to the EFPF portal:


Click here to go to log-in page

From the EFPF portal interface, open the Anomaly detection component by first selecting the Data Analytics tool in the left hand panel and then selecting either the Anomalies card or the Anomaly Detection Solution as shown in the following figures.

Running workflows

The web-based GUI of the AD is designed as a dashboard with several tabs, as shown in the following figure:

The web-based GUI of the AD

The AD is implemented as a web-based solution, which is accessible from the EFPF Portal. The AD is packaged and deployed as a Docker container.

The web-based GUI of the AD is designed as a dashboard with several tabs, as shown in the following figure:

Workflow: This tab contains a set of the analytic workflows based on the programming of different machine learning algorithms in the AD. The workflows provided on this tab allows users to build machine learning models, where each workflow implements one Machine learning algorithm. The following topics describe how to build a model by using a workflow.

Each workflow requires two datasets in CSV format (samples can by download from the link given on the interface). The first dataset is used to create and train a machine learning model and it should contain the values that represent the expected behaviour of the machine. The second dataset is used for testing of the machine learning model by looking for anomalies in the data. The recommended proportion of the datasets could be any of the following for training and testing correspondently: 60/40, 70/30, 75/25 (percentages).

This workflow takes the users through the following:

a. Select a workflow from the dashboard

b. Provide the two datasets in the web-based form

c. Analyse correlation matrix that is developed based on the analysis of the datasets

d. Perform Training & Testing of the machine learning model

e. Download the generated model for deployment or reuse

The following table describes the different steps mentioned above:

Figure Description
DLE step 1 Choose a workflow

1. Select the Workflow tab
2. Chose a workflow e.g. the Neuronal Network Values per columns figure
DLE step 2 Workflow datasets form

1. Type a name for the ML model to be developed
2. Choose the Dataset file (CSV) for training
3. Chose the dataset file (CSV) for testing

Note: Users can download a sample of each dataset from the link “here” shown under each dataset field
DLE step 3 Importation

1. Verify the sample of the training dataset
2. Verify the sample of the test-dataset
3. Check the box for each data parameter that should be to be included in building of the ML model
4. Click on the Correlation button to display the Correlation charts
DLE step 4 Correlation matrix

1. Analyse the correlation matrix table
2. Correlation matrix chart is shown (in dark colour the strong correlations)
3. Matrix correlation is shown to display how spread are the values between a pair of sensors
4. Click on the button to perform the anomaly detection training
DLE step 5 Training & Testing

This stage uses the training dataset to train the model, looking for the hidden patterns that make possible the anomaly detection.
A table with the values of the selected parameters is shown with the relevant threshold. This is to enable users to identify the values that are out the normal behaviour. The following steps are made available to the user:

1. Analyse results of the training
2. Analyse the table of anomalies in the test-dataset
3. Analyse the visualization by a Whisker Plot that shows the values in/out the RMSE threshold, the chart are spited in two groups to show: Train and Test dataset
4. Click on the button to Make a prediction test on the test-dataset
DLe step 6 Downloading the generated model

1. Once the user clicked the button to make predictions, a message is displayed to notify the user that the developed ML model built is available in POJO, JSON and MOJO format
2. The table with a sample of the prediction is displayed
3. Click on the button to download already trained model in a ZIP fil. This Zip file (4) contains the models in POJO (5), JOSN (6) and MOJO (7) format.
What is a MOJO?
A MOJO (Model Object, Optimized) is an alternative to POJO; MOJOs do not have a size restriction. A MOJO is a ZIP file as well (8).

Note: The file shown in the figure by the number 9 is a file with the Machine learning libraries used to implement a model in the production environment

Deploying a model

ML-Model Deployer: From the tab Menu, (as shown in the following figure) the ML-Model Deployer tab provides users the information about existing models (1) and the operations that can be performed on the available models (2). In this way the GUI not only provides some details of the already developed models (3), it also provides the operations (run/stop, delete, visualise) that can be performed on the already developed models (4).

ML-Model Deployer

The following table describes the steps that users can perform on the Deployer tab:

Figure Description
Stopping a model Stopping a model
1. Press the stop button
2. The status change from Implemented to Not Implemented and the button from Stop to Run
Deleting a model Deleting a model
1. Click on the bin button
2. Confirm (Ok) or Cancel de action
Stream visualization Stream visualization
1. Click on the Visualization button
2. The Stream chart will be displayed
3. Sensor values (updated at the time a data is streamed)

What is an implemented or running model?

Once a model is created and stored, it can be implemented or run to connect it with a SOURCE (data stream) queue in the data broker (1). The running model will receive messages (2) from the specific que on the broker and analyse them in real-time through the anomaly detection model implemented. Upon detecting an anomaly the particular message (or sensor reading) is tagged with an extra parameter named isAnomaly (3). This additional parameter is a Boolean value that means whether a row data is an anomaly (1/true) or not (0/false). This process is represented in the figure below.

Running model

Broker: From the AD dashboard, the Broker tab provides an interface to the EFPF message/data broker ( https://rabbitmq.smecluster.com/#/queues). Using the intuitive interface of the broker the user is able to perform the following actions (as shown in the following figure):

  1. Choose the Broker board

  2. Choose a virtual host

  3. Explore the SOURCE queue

  4. Explore the ANOMALIES queue


Publisher: From AD dashboard, the Publisher tab alllows the users to read a dataset (CSV file) and create the messages to be sent to a specific queue in the EFPF Broker. In this respect, the publisher can be used to simulate a sensor/machine.

To publish a Dataset without ID (where the raw data has not associated identifier e.g. time stamp), the user can follow the following steps:

Figure Description
Dataset Non-ID Dataset Non-ID
1. Choose the Publisher tab
2. Choose the Dataset Non-Id icon
Publisher form Publisher form
1. Fill up the form
2. Click on submit button to publish the dataset by messages

3. The number of messages published will be shown in this bar


Anomaly Detection Overview


Anomaly Detection Admin Guide