In EFPF, we follow the index-time merge architecture to implement a federated search approach. The main reason to select index-time merge architecture over search-time merge and hybrid architectures is due to non availability of search indices in the majority of the base platforms. Only NIMBLE currently has a search index to provide text based search functionality.
The index-time merge search requires content from base platforms to be acquired into a central index at the EFPF platform, in order to enable platform level search for products/services and partners/companies across the four base platforms. The index-time merge search is also used to implement traditional enterprise search systems, in which information can be retrieved across heterogeneous data sources in an enterprise. Figure 1 depicts the index-time merge architecture for federated search in EFPF.
Figure 1 : Federated Search Architecture
The major advantages of the index-time merge architecture, as shown in Figure 1, are as follows:
- Through acquiring all data into a central index, sophisticated query enhancement and relevancy algorithms can be applied, providing the user with excellent search results.
- The selected search architectural approach allows for flexibility in the implementation of the recommendation and matchmaking engine.
- The indexed data and ML algorithms can be used to provide product/services/partners recommendation.
The following disadvantages of the index-time merge architecture are summarised below:
- Acquiring the content from the various repositories and data sources of the base platforms requires considerable efforts; for example, it needs to be done using scheduled read-only processes that would need to be designed and implemented at the data integration layer. This also requires a decision about the frequency of the data ingestion into a central index. Data ingestion frequency needs to be configured hourly, daily or weekly, depending on the data velocity of the base platforms.
- For different types of data sources, additional data connectors need to be implemented to enable the data integration.
Deployment Architecture (Development Environment)
Following is the Deployment Architecture of the matchmaking service in the current development environment.
Figure 2 : Matchmaking Service Deployment Architecture
Matchmaking Data Flow
Following is the Data Flow Diagram of the matchmaking system and it’s interaction with other components in the EFPF platform.
Figure 3 : Matchmaking Service Data Flow Diagram
Federated Search Data Flow Description
The main data flow is the one from base platform data stores (most base platforms expose their data stores via an API) to the federated index in matchmaking via the indexing processes running on the integration flow engine (data spine). The data ingestion process is implemented as a set of Apache Nifi workflows. This data flow is triggered periodically to retrieve the latest data from the base platform. The schedule is configured in Nifi configurations in the processes.
For the implementation of the federated search-index as a data store, we use Apache Solr which is a scalable and fault tolerant search platform that provides distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.
The matchmaking ontology content needs to be indexed in the Apache Solr in the first step, in order to ensure that all required information about participants and value-units (domain knowledge) can be captured during the search process.
In the indexing workflow a data transformation occurs between the incoming data model to the federated data model. The federated data model is discussed in detail under EFPF Manufacturing Ontology in the next section.
The data transformation is implemented using Nifi Jolt transformation processor. Jolt (Json Language for Transform) is an open-source JSON to JSON transformation library. It allows the developer to define rules of transformations as a JSON specification file. The Jolt transformation processor in Apache Nifi processes the incoming data flow file and executes the transformation rules and converts the data flow file to the target schema in the Nifi workflow. Then a Solr output processor is configured as the final processor in each indexing workflow to index the data into the EFPF federated index.