Matchmaking Developer Guide

Matchmaking Service Developer guide

Introduction

The EFPF Federated search & Matchmaking Service is developed as a Spring Boot Micro-Service. It incorporates the Matchmaking Ontology and an API for accessing the indexed data. Additionally it hides the complexity when using a full featured search engine (Apache SOLR) and provides simplified access to the indexed data.

Source Code

The source code is available from gitlab, run :

git clone https://gitlab.fit.fraunhofer.de/efpf-pilots/matchmaking-services

to obtain the source code The. repository shows the following directories:

  1. solr-data-model : Holds the java implementation for the federated index data model, e.g. descriptions for classes, properties, parties and items/products/servicen. The distinct index collections in Apache Solr are automatically created based on these classes.
  2. federated-search-service : Java Spring Boot micro-service for federated search. This service provides functionality to index & search ontologies, parties and items using Apache Solr as the index provider.
  3. docker-setup : Spring Cloud microservices infrastructure, Apache Solr server and federated.search service packaged as docker containers. This docker-setup can be used to deploy the matchmaking service (services-folder) and required infrastructure (infra-folder) in any host-machine that has docker service.
  4. nifi-resources : The NiFi data transformation files and matchmaking NiFi template resources

Solr-Data-Model

Provides the solr data model to model which reflects the Matchmaking Ontology.

To build the data models :

cd solr-data-model
mvn clean install

The solr model is created with Java Classes which are annotated in a JPA style, more precisely with Spring Data Solr, defining the name of the collection and the respective index fields to use when storing documents (e.g. classes, properties, party etc.).

When indexing arbitrary models, such as product items, companies etc., it is required to have a basic model for concepts, their properties and also for possible value lists of properties. This basic model is shown in Figure 1. img Figure 1 : Core Concept Model for Indexing (Java Methods omitted)

The model is inspired from the SKOS Ontology but has been adopted for using with the index. The (abstract) basic Concept class specifies the respective multi-lingual labeling such as

  • preferred name
  • alternate names
  • hidden names
  • description
  • comments

for each concept. The core classes inheriting from Concept are

  • ClassType: Stores Concept Classes, interlinks with properties, allows hierarchical levels, implements IPropertyAware.
  • PropertyType: Stores meta-data for properties, interlinks with classes, may point to a list of values (CodedType)
  • CodedType: Stores allowed values for a property.

All of the classes inherit the functionality from Concept, so they may be equipped with preferred, alternate and hidden labels in many languages. In addition, the ClassType object implements IPropertyAware.

IPropertyAware

The Interface IPropertyAware provides additional methods for maintaining PropertyType data along with the implementing collection. As indictated, the ClassType collection implements IPropertyAware, thus the ClassType may be equipped with corresponding PropertyType meta data. As an example the Concept GeoLocation is defined with the properties longitude and latidude. Both PropertyType meta data objects describing longitude and latitude may be placed inside the ClassType Meta Data. The ClassType for the GeoLocation then looks the following:

{
    // the languages in use are maintained
    "languages": [     
        "de",
        "en"
    ],
    // preferred Label, for each language
    "label": {
        "de": "GEO Position",
        "en": "GEO Location"
    },
    "comment": {
        "en": "Position object: latitude, longitude"
    },
    "type": "class",
    // the list of assigned properties
    "properties": [    
        "urn:test:geoLocation:latitude",
        "urn:test:geoLocation:longitude"
    ],
    "propertyMap": {   
        
        // map of assigned property meta-data
        // note: each property may be retrieved separately
        "urn:test:geoLocation:latitude": {
            "languages": [
                "de",
                "en"
            ],
            "label": {
                "de": "Geografische Breite",
                "en": "Latitude"
            },
            "type": "property",
            "range": "http://www.w3.org/2001/XMLSchema#double",
            "valueQualifier": "NUMBER",
            
            // property usage for each collection
            "propertyUsage": {  
                "class": [
                    "urn:test:geoLocation"
                ]
            },
            "required": true,
            "id": "urn:test:geoLocation:latitude",
            "federatedId": "urn:test:geoLocation:latitude",
            "uri": "urn:test:geoLocation:latitude"
        },
        "urn:test:geoLocation:longitude": {
            "languages": [
                "de",
                "en"
            ],
            "label": {
                "de": "Geografische Länge",
                "en": "Longitude"
            },
            "type": "property",
            "range": "http://www.w3.org/2001/XMLSchema#double",
            "valueQualifier": "NUMBER",
            "propertyUsage": {
                "class": [
                    "urn:test:geoLocation"
                ]
            },
            "required": true,
            "id": "urn:test:geoLocation:longitude",
            "federatedId": "urn:test:geoLocation:longitude",
            "uri": "urn:test:geoLocation:longitude"
        }
    },
    "collection": "class",
    "id": "urn:test:geoLocation",
    "federatedId": "urn:test:geoLocation",
    "uri": "urn:test:geoLocation"
}

e.g. the PropertyType data is linked with the ClassType object where it is assigned. The indexing service checks for IPropertyAware objects and performs the following: It

  • stores the ClassType in its collection
  • stores the PropertyType in the respective collection
  • maintains the interlinking between the PropertyType collection and the corresponding IPropertyAware collection.
  • restores the PropertyType data with the assigned ClassType on retrieval.

The PropertyType collection however holds descriptive meta data for arbitrary attributes in any of the other collections. By applying the IPropertyAware interface to the respective collection implementation, the property index can interlinked, for example, to name those attributes in a user interface. Overall, ClassType concepts name/denote any kind of abstract concepts, so they may be equipped with properties, relevant for the concepts. The main usage however is the classification of arbitrary indexed objects. To allow the seamless use with other collections, the interface IClassifcationAware is used.

IClassificationAware

By applying the IClassificationAware interface to any of the collections, the interlinking between the classifying collection and the classification data (ClassType collection) can by established.

img Figure 2: IClassificationAware and implementing Classes

Currently, the collections storing items, companies and business opportunities implement this particular interface. That way, all of the collections may be annotated with concepts from the ClassType collection. In case a provided concept (addClassification(ClassType concept)) is not present, it is added automatically to the ClassType index. The classification meta data is restored and integrated in the response when querying for items, companies, business opportunities.

ICustomPropertyAware

The description of Classes, Properties and property values is completely predefined, the required attributes are already specified in the core model as outlined in Figure 1. Other objects to be indexed, such as catalogue items, companies, persons, business opportunities

  • inherit the core attributes defined in Concept
  • extend the core functionality by adding already known attributes with getter/setter methods
  • implement the ICustomPropertyAware interface, to allow additional, not predefined attributes.

The predefined attributes in the distinct collections are managed with corresponding getter/setter methods. The ICustomPropertyAware interface however injects methods which allow the definition of completely custom methods including the corresponding metadata as outlined above with the IPropertyAware interface.

img Figure 3: ICustomPropertyAware and implementing Classes

The interface provides a bunch of default methods for the implementors. They allow adding custom properties along with flexible classifiers to any catalogue item, party or business opportunity. The classifier(s) are required to store the attribute value in the index, they are used for the naming of the dynamic index fields.

ItemType, PartyType and OpportunityType

The Federated Search Service’s main topics are the management of index collections for Catalogue Items, Company/Party data and also for Business Opportunities. They represent singl collections in the federated search index. img Figure 4 : Partytype Model for Company Data Indexing (Java Methods omitted)

All of the distinct types inherit from Concept, thus receive the fields for multilingual labelling from Concept. Then they add the already known fields (including getter/setter methods) relevant to Catalogue items, Companies and Business Oppunities which are used in any of the federated platforms. All of the types may be classified with concepts from the ClassType index, e.g. they implement IClassificationAware which injects the required functionality. The federated search service will check on every update of the respective collections for provided classification links and will maintain both, the classification metadata in the ClassType collection and the linking between PartyType, OpportunityType and ItemType. This also holds for the possibilty of adding arbitrary (currently unknown) attributes to the indexed collectons where the required functionality is injected with ICustomPropertyAware. Here, the provided custom attributes are managed with the PropertyType collection.

Federated Search Service

The stand-alone federated search service is implemented using Spring Boot and provides services for manipulating the search index and also provides sophisticated search functionalities supporting full text search, faceted search over the indexed collections. Follow the below steps to run the service in your local setup, assuming the Apache SOLR Service runs locally as well.

Dependencies

As outlined above, a running Apache Solr service is required for the federated-search-service to start. If necessary download from the Apache Solr Website. Downloading and extracting the binary zip file should be sufficient. It is also possible to use the provided Docker-Image. For starting Apache SOLR run :

<solr.home>/bin/solr start -cloud

Once Apache Solr is running point the browser to the Solr Admin Page to verify, no collections are created at this point. The respective collections are created automatically on startup of the federated-search-service.

Note: For stopping the SOLR service, run

<solr.home>/bin/solr stop -all

Note, that the federated-search-service is then no longer functional.

cd federated-search-service
mvn clean spring-boot:run

During startup, the service tries to create new collections in the configured Solr server, in particular

  • Company Information (party)
  • Categories (class)
  • Properties (props)
  • Item (products/services)
  • Busines Opportunity

The federated-search-service manages company related data (including contact information) for convenient search. The other collections, in particular class and props are designed to store accompanying information, for example categories and descriptions for used properties. For now only the party collection is in use!

Docker setup

For the purpose of deploying federated-search-service and required infrastructure (eg: Apache Solr, Spring Cloud infrastructure) in a containerized environment, a docker-setup is made available. See Gitlab Intro (login required) for details.

API for manipulating the federated search index

For each of the distinct collections (replace <collection> with one of class, property, code, party, item, opportunity), the following API methods are available:

Method Context Path Description
GET /<collection>?uri=<federated URI> Read a distinct index item from the desired collection with it’s federated URI
GET /<collection>?basePlatform=<basePlatform>&id=<id> Read a distinct index item from the desired collection
POST /<collection> Create or Update an object in the desired collection
DELETE /<collection>?uri=<federated URI> Remove an object in the desired collection
GET /<collection>/fields Obtain a list of index fields in use. An index field name may be constructed dynamically, for example a label in a given language (“de”) is stored in a field name de_label. The field name is important when using the faceting mechanisms.
GET /<collection>/select Search for items in the desired collections, the request parameters allow for fine tuning the search. This allows passing the query string (free text) or multiple field query expressions for filtering in dedicated field names. Faceting is also supported by this method.
POST /<collection>/search Search for items in the desired collection. Opposite to the search method, the query parameters are provided with a Search objet to overcome the HTTP request length limitations.
GET /<collection>/suggest Minimalistic search method. This allows for example searching in a label field (e.g. en_label) for suggested values in a drop-down list, e.g. query while typing.
POST /<collection>/reset?basePlatform=<basePlatform> Method consuming a list of <collection> entries, replacing the existing entries in the collection.

The request for theparameters are in line with the Lucene Search Syntax, the parameters are as follows:

  • q=<query string> defaults to *:*
  • fq=<field query expr>&fq=<field query expr>
  • facet.field=<field_name>&facet.field=<field_name>
  • facet.mincount=<mincount> defaults to 1
  • rows=<number of items per page>
  • start=<the requested page number>

The fq and the facet.field parameters may be provided multiple times.

Direct read of stored objects

Obtain the stored object from the index, the uri/id must be provided!

curl -X 
   GET "https://efpf-security-portal.salzburgresearch.at/api/index/party?uri=urn:test:party" 
	-H  "accept: */*" 
	-H  "Authorization: Bearer …"
	-H  "Content-Type: application/json" 

This will return the party type object as JSON.

Search the party index with GET request

Search the party index for companies and who are classified with a ClassType where one of the labels contains the value “Company” AND the party must be annotated with a business opportunity where at least one label contains the value “Description”

curl -X GET "https://efpf-security-portal.salzburgresearch.at/api/index/party/select
                    ?q=*:*
                    &fq=class.allLabels:*Company*
                    &fq=opportunity.allLabels:*Description*
                    &facet.field=classification
                    &facet.field=en_origin
                    &facet.field=en_activitySectors"

              	-H  "accept: */*" 
              	-H  "Authorization: Bearer …" 
              	-H  "Content-Type: application/json" 

Note: the stars in the query expression are wildcards, e.g. join the party index based on the classification field with the class index, and select only parties classified with a concept where the term Company appears in any label of the concept description AND which offer a Business Opportunity where the term Description can be found in any label of the Business Opportunity.

Search the party index with POST request

Adding big numbers of facet fields or field query expressions exceeds the allowed length for the resulting GET request. As a solution a search object providing the exact same information may be posted to the indexing service.

{
    "q": "*:*",
    "fq": [
        "class.allLabels:*Company*",
        "opportunity.allLabels:*Description*"
    ],
    "facet": {
        "field": [
            "classification",
            "en_origin",
            "en_activitySectors"
        ],
        "limit": 10,
        "minCount": 1
    },
    "start": 0,
    "rows": 10
}

This Search JSON object may be sent to the service with

curl -X POST "https://efpf-security-portal.salzburgresearch.at/api/index/party/search" 
	-H  "accept: */*" 
	-H  "Authorization: Bearer …"
	-H  "Content-Type: application/json" 
	-d  "<json-data>" 

The Search object also outlines the possibility of adding facet parameters to a search.

Delete companies from the party index

The delete method requires the URI of the respective object to delete.

curl -X 
 DELETE "https://efpf-security-portal.salzburgresearch.at/api/index/party?uri=urn:test:party" 
	-H  "accept: */*" 
	-H  "Authorization: Bearer …"
	-H  "Content-Type: application/json"

The method reports true in case the deletion is successful!

Using the /<collection>/fields endpoint

For applying field query expressions or to properly use facet fields, it is required to know the used field names in the index. For the predefined attributes of the distinct collection types, the resulting name of the attribute in the index is well known. Many of the attributes however have dynamic parts, such as the multilingual labels or the custom attributes.

For those attributes, the resulting field name in the index cannot be predicted. But for using the faceting options, and field query expressions, it is important to know the field names exactly.

For that, the fields endpoint reports the fields name in use per collection. Additionally, it provides information, whether the field has been created dynamically, the dynamic part itself, and most importantly, a descriptive label for the field when linked with the PropertyType collection. The result when retrieving the fields for a collection looks like the following JSON structure:

[
    {
        "fieldName": "en_activitySectors",
        "dataType": "string",
        "docCount": 541,
        "dynamicBase": "*_activitySectors",
        "mappedName": "en_activitySectors",
        "dynamicPart": "en"
    },
    {
        "fieldName": "en_origin",
        "dataType": "string",
        "docCount": 261,
        "dynamicBase": "*_origin",
        "mappedName": "en_origin",
        "dynamicPart": "en"
    },
    {
        "fieldName": "classification",
        "dataType": "string",
        "docCount": 584,
        "mappedName": "classification",
        "uri": "urn:property:classification",
        "label": {
            "de" : "Klassifikation",
            "en" : "Classification"
        }
    }, …
]

Note: The relationship between the index field name and the corresponding property is managed automatically when using the API as shown in the next section. The linking information however is stored in the itemFieldName-Attribute of PropertyType collection.

Nifi Resources

This folder holds the JOLT transformations in use for the data gathering from distinct portals.

According to the Matchmaking Architecture, the data gathering is performed by an Apache Nifi Pipeline which performs the following steps for each connected portal (iQluster, Composition, SMECluster, Nimble)

  1. remove data for the processed portal from the party or item collection
  2. contact the portals/services (dedicated API Calls) to collect the data
  3. transform the collected data into the common data format (see Matchmaking Ontology) using JOLT transformations
  4. stores the transformed data in the index

For a complete understanding of how JOLT works, the JOLT Demonstrator may be used

How to modfidy or extend the Matchmaking Service

Add more Data Sources

Currently the Matchmaking Service combines company, catalogue and business opportunity data from

  • SMECluster
  • Valuechain Network Portal (previously known as iQluster)
  • Composition
  • Nimble
  • B2BMarket

On order to add a new data source to the Matchmaking Service, a new Apache NiFi Pipeline must be established. Each of the currently integrated platforms stores itself in the basePlatform field for later distinction of the data sources.

Adjust the Matchmaking Ontology

In order to use more attributes with the Matchmaking Service - for example to allow more search options - the repective Matchmaking Ontology classes may be modified. A complete redeployment of the matchmaking service will then create the new fields with the Apache Solr service.

Previous
Next