NiFi Data Transformation Examples

Apache NiFi is popularly knows as the Swiss Army Knife of dataflow. NiFi provides several built-in processors for data processing, conversion, transformation, enrichment, etc. Some of these processors that can be particularly used for data model transformation are introduced below.

Jolt (JoltTransformJSON Processor)

  • Jolt (JsOn Language for Transform) is a JSON to JSON transformation library written in Java.
  • The Jolt specification (spec) for performing the transformation is also written in the form of a JSON document.
  • Jolt provides a set of transforms as illustrated in the following table and each transform has its own Domain Specific Language (DSL) to specify the transform concern. These transforms can be used to transform the structure of JSON data. The inclusion of several ‘wildcards’ into the DSL of the shift transform make it the most powerful among these, enabling it to perform about 80% of the transformation work.
Transform Description/Purpose
shift To copy data from input to desired place(s) in the output JSON tree
default To add default values to the output JSON tree
remove To remove data from the tree
sort To sort the Map key values alphabetically for debugging purposes
cardinality To “fix” the cardinality of input data. E.g., the “urls” element is usually a List, but if there is only one, then it is a String
java To perform value transformation through Java code
Jolt Example 1: Shiftr
  • The following Jolt spec uses a simple shift transform / shiftr to perform data transformation
Input Data Model Output Data Model Jolt Spec

  “rating”: { 
    “quality”: { 
      “value”: 3 
    }, 
    “primary”: { 
      “value”: 4 
    } 
  } 
}
{
  “SecondaryRatings”: {
    “quality”: {
      “Value”: 3
    }
  },
  “Rating”: 4
}

{
  “rating”: {
    “quality”: {
      “value”: “SecondaryRatings.quality.Value”
    },
    “primary”: {
      “value”: “Rating”
    }
  }
}

Jolt Example 1 Integration Flow in NiFi Figure 1: Jolt Example 1 Integration Flow in NiFi

JoltTransformJSON Processor Configuration Figure 2: JoltTransformJSON Processor Configuration

Jolt Example 2: Chainr
  • The transforms listed in the table above can be chained together to form the complete Jolt transformation spec.
  • The following Jolt spec chains together shift and default operations to perform the data transformation.
Input Data Model Output Data Model Jolt Spec
{
  “temperature”: 25,
  “humidity”: 81,
  “wind”: 14,
  “precipitation”: 51,
  “location”: “Bonn”,
  “timestamp”: “2020-06-04T13:49:00.4096331Z”
}
{
  “weather-data” : {
    “temperature” : 25,
    “humidity” : 81,
    “wind” : 14,
    “precipitation” : 51,
    “location” : “Bonn”,
    “timestamp” : “2020-06-04T13:49:00.4096331Z”
  },
  “attributes” : [ “temperature”, “humidity”, “wind”, “precipitation”, “location”, “timestamp” ],
  “metric-system” : “custom”
}
[
  {
    “operation”: “shift”,
    “spec”: {
      "@": “weather-data”,
      "*": {
        "$(0)": “attributes”
      }
    }
  },
  {
    “operation”: “default”,
    “spec”: {
      “metric-system”:
                 “custom”
    }
  }
]
Jolt Example 3: Chainr
  • The following Jolt spec chains together shift, modify-default-beta and remove operations to perform the data transformation.

  • The following integration flow: (1) calls NIMBLE’s Marketplace Service directly with the EFPF credentials passed in the request, (2) Gets and transforms the response’s payload to the one that conforms to the (Mockup) Integrated Marketplace’s data model, and (3) Returns the resulting payload that conforms to the (Mockup) Integrated Marketplace’s data model

  • Template to load and run in a local instance of NiFi Jolt Chainr Example 3 Template

  • Integration flow (dataflow) in NiFi

Jolt Chainr Example 3 Integration Flow in NiFi Figure 3: Jolt Chainr Example 3 Integration Flow in NiFi

JoltTransformJSON Chainr Processor Configuration Figure 4: JoltTransformJSON Chainr Processor Configuration

Advantages and Limitations of Jolt Processor
  • The downside of Jolt is that it is not Turing complete i.e., not every aspect of data transformation can be achieved with Jolt. Jolt can be used to perform structural transformations on data but not manipulate values. Java code needs to be written to perform value transformations. Jolt is not intuitive, but easier to learn as compared to some other data transformation tools/languages such as XSLT. Having lesser options helps to keep it simple. Thus, there is a trade-off between ‘ease of use’ and transformation ‘coverage’.
  • The documentation for Jolt is comprehensive enough for the functionality it offers. In addition, in order to learn Jolt, a Jolt playground is available. NiFi provides Jolt as a built-in processor: JoltTransformJSON.

More information:

  • Jolt source code repository and readme documentation on Github: link
  • Jolt slide deck: link

XSLT (TransformXml Processor)

  • XSLT (Extensible Stylesheet Language Transformations), a WC3 standard, is a language for transforming XML documents into other XML documents. It is used in conjunction with XPath 2.0 to write rules to match parts of a parsed XML document and compile a new XML document. However, since a JSON to XML conversion is part of the specification and the rules can define functions and use regular expressions, it is applicable for other formats as well. E.g. Avro can be converted from binary to JSON and then to XML for transformation by XSLT. (Any text representation can be generated, as long as it is included between two XML root elements.)- XSLT transformations can use imports, so libraries of rules can be imported and re-used. Partial support for transformation of a set of standards can be built and extended in a modular fashion, gradually covering more and more of the schema in a value co-creation process. The XSLT transformation documents can be version managed using e.g., Git or other tools. There are scripting capabilities and regular expression support in some environments.
  • Limitations: Factors that speak against XSLT and XML are that today, it is not as widespread as e.g., JSON is and there may not be a large base of developers familiar with it. That all transformations must start and end in valid XML is also a limitation that needs to be circumvented using the above methods.
  • The TransformXml processor in NiFi transforms the XML payload using the provided XSLT script.
XSLT Example
  • Input Data Model
<?xml version="1.0" ?>
<observations>
	<observation sensorId="k20">
		<temperature>25</temperature>
		<humidity>51</humidity>
	</observation>
	<observation sensorId="k21">
		<temperature>26</temperature>
		<humidity>48</humidity>
	</observation>
</observations>
  • Output Data Model
<?xml version="1.0" encoding="UTF-8"?>
<root>
	<temperature sensorId="k20">25</temperature>
	<temperature sensorId="k21">26</temperature>
</root>
  • XSLT Code
<?xml version="1.0" encoding="UTF-8"?>
	<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
		<xsl:output method="xml" indent="yes"/>
		<xsl:template match="/observations">
			<root>
				<xsl:apply-templates select="observation"/>
			</root>
		</xsl:template>
		<xsl:template match="observation">
			<temperature sensorId="{@sensorId}">
				<xsl:value-of select="temperature" />
			</temperature>
		</xsl:template>
</xsl:stylesheet>

XSLT Example 1 Integration Flow in NiFi Figure 5: XSLT Example 1 Integration Flow in NiFi

TransformXml Processor Configuration Figure 6: TransformXml Processor Configuration

  • A lookup service such as the ‘SimpleKeyValueLookupService Controller Service’ needs to be used to store and retrieve the XSLT script.

SimpleKeyValueLookupService Controller Service Configuration Figure 7: SimpleKeyValueLookupService Controller Service Configuration

Data Transformation using Scripts (ExecuteScript Processor)

  • ExecuteScript is a processor provided by Apache NiFi that facilitates users in writing a script for performing data transformation.
  • It takes data input in the form of a FlowFile – the data serialization format used by NiFi, processes it as specified in the script and finally generates another FlowFile as output that contains the transformed data.
  • The languages supported for writing the data transformation script are Clojure, ECMAScript, Groovy, Lua, Python and Ruby.
  • An example where the ExecuteScript processor uses Javascript code to perform the data transformation is illustrated below.
ExecuteScript Example
  • Input Data Model
{
  "rating": {
    "quality": {
      "value": 3
    },
    "primary": {
      "value": 4
    }
  }
}
  • Output Data Model
{
  "SecondaryRatings": {
    "quality": {
      "Value": 3
    }
  },
  "Rating": 4
}
  • Javascript code in ExecuteScript processor
var StreamCallback =  Java.type("org.apache.nifi.processor.io.StreamCallback");
var IOUtils = Java.type("org.apache.commons.io.IOUtils");
var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");

var flowFile = session.get();
if(flowFile != null) {
  try {
    // Create a new StreamCallback, passing in a function to define the interface method
    flowFile = session.write(flowFile,
            new StreamCallback(function(inputStream, outputStream) {
              var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
              var inputJson = JSON.parse(text)
              var outputJson = {}
              outputJson.SecondaryRatings = {}
              outputJson.SecondaryRatings.quality = {}
              outputJson.SecondaryRatings.quality.Value = inputJson.rating.quality.value
              outputJson.Rating = inputJson.rating.primary.value
              outputStream.write(JSON.stringify(outputJson).getBytes(StandardCharsets.UTF_8))
            }));

    // Last operation is transfer to success (failures handled in the catch block)
    session.transfer(flowFile, REL_SUCCESS)
  } catch(e) {
    log.error('Something went wrong', e)
    session.transfer(flowFile, REL_FAILURE)
  }
}

ExecuteScript Example Integration Flow in NiFi for Data Transformation Figure 8: ExecuteScript Example Integration Flow in NiFi for Data Transformation

ExecuteScript Processor Configuration Figure 9: ExecuteScript Processor Configuration

Advantages and Limitations of ExecuteScript Processor
  • ExecuteScript is a versatile processor that enables the user to write source code to perform any kind of processing on data as desired.
  • The ExecuteScript processor does not include the functionality to debug the code added to it.
  • As per the official documentation of NiFi, this processor is experimental and therefore, the impact of sustained usage is not known.

Data Spine NiFi Documentation

References

[1] NiFi Documentation: https://nifi.apache.org/docs.html
[2] NiFi ExecuteScript Cookbook: link
[3] EFPF Data Spine D3.2 Deliverable: link

Previous
Next