Service that provides indexing objects by Elasticsearch and also providing the search webservice.
This project contains all the code required to index objects using Elasticsearch. It also provides the search webservice to fetch the indexed objects from Elasticsearch.
Note: There is no authentication required on the endpoints since they are not exposed to the outside world. The search-indexer is available only inside the k8s cluster and hence needs port-forwarding to be invoked locally.
E.g. kubectl port-forward service/smb-search-indexer 8085:8085
The search API is a JSON based API that provides different projections to fetch object data with different level of detail.
The desired projection can be added as query parameter to the URL as ?projection=oneOf(flat,full, id)
.
- id: Objects contain only the
id
attribute - common use is intended to check for existence - flat (default): Objects contain simple attributes (number, string, string[]) with display information - common use is intended for list views
- full: Objects contain complex attributes with mainly
.formatted
and.markup
nested attributes - common us is intended for detail views
{
"id": number
}
{
"@id": string,
"@initialImport": datetime,
"@lastSynced": datetime,
"acquisition": [string],
"archiveContent": string,
"assets": [string],
"attachments": boolean,
"assortments": [string],
"collection": string,
"collectionKey": string,
"compilation": string,
"creditLine": string,
"culturalReferences": [string],
"dateRange": {
"gte": date(epoch_seconds),
"lte": date(epoch_seconds)
},
"dating": [string],
"description": string,
"dimensionsAndWeight": [string],
"exhibit": boolean,
"exhibitionSpace": [string],
"exhibitions": [string],
"findSpot": string,
"geographicalReferences": [string],
"highlight": boolean,
"iconclasses": [string],
"iconography": [string],
"id": number,
"identNumber": string,
"inscriptions": [string],
"involvedParties": [string],
"keywords": [string],
"literature": [string],
"location": string,
"materialAndTechnique": [string],
"permalink": string,
"provenance": [string],
"provenanceEvaluation": string,
"signatures": [string],
"technicalTerm": string,
"title": string,
"titles": [string]
}
{
"@id": string,
"@initialImport": datetime,
"@lastSynced": datetime,
"acquisition": [string],
"archiveContent": string,
"assets": [{
"id": number,
"filename": string,
"linkTemplate": string,
"formatted": string
}],
"attachments": boolean,
// candidate for BREAKING CHANGE, may become object[]
"assortments": [string],
"collection": string,
"collectionKey": string,
"compilation": string,
"creditLine": string,
"culturalReferences": [{
"id": number,
"typeId": number,
"denominationId": number,
"name": string,
"formatted": string,
"markup": string
}],
"dateRange": {
"gte": date(epoch_seconds),
"lte": date(epoch_seconds),
"formatted": string
},
"dating": [string],
"description": {
"formatted": string,
"markup": string
},
"dimensionsAndWeight": [string],
"exhibit": boolean,
"exhibitionSpace": [string],
// *BREAKING CHANGE* changed from string[] to object[]
"exhibitions": [{
"id": number,
"title": string,
"formatted": string,
"markup": string
}],
"findSpot": string,
"geographicalReferences": [{
"id": number,
"typeId": number,
"denominationId": number,
"location": string,
"details": string,
"formatted": string,
"markup": string
}],
"highlight": boolean,
// *BREAKING CHANGE* changed from string[] to object[]
"iconclasses": [{
"id": number,
"key": string,
"formatted": string,
"markup": string
}],
// *BREAKING CHANGE* changed from string[] to object[]
"iconography": [{
"id": number,
"formatted": string,
"markup": string
}],
"id": number,
"identNumber": string,
// candidate for BREAKING CHANGE, may become object[]
"inscriptions": [string],
"involvedParties": [{
"id": number,
"name": string,
"dateRange": string,
"denominationId": number,
"formatted": string,
"markup": string
}],
// *BREAKING CHANGE* changed from string[] to object[]
"keywords": [{
"id": number,
"formatted": string,
"markup": string
}],
// candidate for BREAKING CHANGE, may become object[]
"literature": [string],
// candidate for BREAKING CHANGE, may become object
"location": string,
"materialAndTechnique": [{
"id": number,
"name": string,
"typeId": number,
"formatted": string,
"markup": string
}],
"permalink": {
"formatted": string,
"markup": string
},
"provenance": [string],
"provenanceEvaluation": string,
// candidate for BREAKING CHANGE, may become object[]
"signatures": [string],
// *BREAKING CHANGE* changed from string to object
"technicalTerm": {
"formatted": string,
"markup": string
},
"title": string,
// *BREAKING CHANGE* changed from string[] to object[]
"titles": [{
"formatted": string,
"markup": string
}]
}
There is a dedicated inventory endpoint that allows for fetching all indexed objects. This is highly relevant for comparison of indexed objects against objects stored in Hasura and objects published for SMB Online in MDS.
The inventory endpoint allows to specify language, separator, start and end id for partial inventory.
Parameter | Type | Optional | Default Value | Comment |
---|---|---|---|---|
lang | oneOf(de) | x | de | The language, currently only DE is supported |
startId | number | x | 1 | Sorting fields, multiple separated by comma. Sort direction is specified by leading +/- |
endId | number | x | 999999999999999 | First index of requested result for paginated requests |
sep | string | x | \n | Separator to use in between object ids |
GET
.../index/listing
.../index/listing?startId=2347
.../index/listing?endId=37511
.../index/listing?startId=23472&endId=37511&sep=,
For indexing the IndexController
is available. It provides 4 endpoints to create, update and delete objects
from the index.
POST /index
- Notify to reindex objects by their idPUT /index
- Pass a normalized object for (re-)indexingDELETE /index/{id}
- Remove an object from the indexPOST /index/force-full-reindex
- Force full reindexing of all objects available in Hasura
The POST endpoint only expects the ids
of the objects. The normalization will be performed in the code.
The PUT endpoint expects an already normalized object.
The DELETE endpoint expects no payload but only the id of the target object in the url.
The force POST endpoint does not expect parameters nor payload. However, start and end id can be specified
for partial reindexing.
Examples:
POST
{
"ids": ["122143..362234"]
}
{
"ids": [9, 10, 11, 12, 133, 2443, 3434, 324432]
}
{
"ids": [12, "2132..2245", 3434, "98234..98241", 324432]
}
PUT
{
"@id": "782485",
"@initialImport": "2021-01-28T08:22:37.790821+00:00",
"@lastSynced": "2021-01-28T08:22:37.790821+00:00",
"attachments": true,
"collection": "Kupferstichkabinett",
"dateRange": {
"gte": "1808-01-01",
"lte": "1812-12-31"
},
"dating": [
"um 1810"
],
"dimensionsAndWeight": [
"Abmessungen: 22,9 x 18,2 cm"
],
"exhibit": true,
"highlight": false,
"id": 782485,
"identNumber": "SZ CD.Friedrich 1",
"involvedParties": [{
"id": 23,
"name": "Caspar David Friedrich",
"dateOfBirth": "1789-04-21",
"dateOfDeath": "1840-12-18",
"roleId": 12,
"formatted": "Herstellung: Caspar David Friedrich (1789-1840), Zeichner"
}],
"location": "Neues Museum, Ebene 0, R002",
"longDescription": "Von den acht erhaltenen gezeichneten Selbstbildnissen Friedrichs ist dieses das berühmteste.",
"materialAndTechnique": [{
"id": 234,
"specificTypeId": 213,
"typeId": 32535,
"details": "Graue Kreide, auf Papier",
"formatted": "Graue Kreide, auf Papier"
}],
"technicalTerm": "Zeichnung",
"titles": [
"Selbstbildnis"
]
}
force POST
.../index/force-full-reindex?startId=23472&endId=37511
There is an additional REST endpoint /triggers/index-event
exposed by EventController
that is supposed
to be called from Hasura.
It expects an event-trigger payload with object info in the request. The implementation behind this endpoint
is similar to calling the /index
with a single-element in the ids
array.
Example:
POST
{
"event": {
"data": {
"new": {
"id": 122143
}
}
}
}
For searching the SearchController
is available. It provides 5 endpoints to fetch data from the index.
GET /search
- Run a (simple) search with query parametersPOST /search
- Run an (advanced) search with filters in the payloadGET /search/suggestions
- Get autocomplete search suggestions for a search termGET /search/{id}
- Fetch an indexed object by idGET /search/{id}/export
- Fetch object data as download file
The regular search works with query parameters.
Parameter | Type | Optional | Default Value | Comment |
---|---|---|---|---|
q | string | x | * | The search term |
sort | string | x | -_score | Sorting fields, multiple separated by comma. Sort direction is specified by leading +/- |
offset | number | x | 0 | First index of requested result for paginated requests |
limit | number | x | 20 | Number af requested results for paginated requests |
projection | oneOf(flat,full,id) | x | flat | Defines the response data structure |
lang | oneOf(de) | x | de | The language, currently only DE is supported |
Note: The
q
parameter also allows for extended search syntax like ?q=titles:Museum+AND+technicalTerm:(Bild+OR+Bildnis)
Note: offset+limit must not be greater than 50.000 which is a limit defined by Elasticsearch.
Example:
GET
.../search?q=Muse*&sort=-titles&offset=50&limit=25
The advanced search provides the same query parameters as the simple search but also allows for definition of the search filters in the payload. In this case the underlying implementation builds an advanced searchterm from the filters and joins it with the optional q
parameter.
Field | Type | Optional | Default Value | Comment |
---|---|---|---|---|
q_advanced | complex[] | The advanced search filters | ||
q_advanced[n].operator | enum(AND, OR, AND_NOT) | x | n=0 AND, else OR | Operator for filter combination |
q_advanced[n].field | string | Name of field on which this filter applies | ||
q_advanced[n].q | string | Search term for field specific search |
Note: The
q
attribute also allows for extended search syntax like Bild+OR+Bildnis
Example:
POST
{
"q_advanced": [
{
"operator": "AND",
"field": "exhibit",
"q":"true"
},
{
"operator":"AND",
"field": "titles",
"q":"Frau"
},
{
"operator":"AND",
"field": "dateRange",
"q":"[1900-01-01 TO *]"
},
{
"operator":"AND",
"field": "technicalTerm",
"q":"Bild OR Bildnis"
}
]
}
Search suggestions can be retrieved for a fulltext search term or for concrete fields. If field-specific search suggestions are requested the q
parameter must be prefixed with the requested fieldname and colon.
Parameter | Type | Optional | Default Value | Comment |
---|---|---|---|---|
q | string | The search term, possibly prefixed with field name | ||
limit | number | x | 15 | Number of requested suggestions |
lang | oneOf(de) | x | de | The language, currently only DE is supported |
Examples:
GET
.../search/suggestions?q=Fra
.../search/suggestions?q=titles:Fra&limit=15
The fetch endpoint does not support parameters. It only expects the id of the requested object in the url.
Example:
GET
.../search/372352
The export endpoint allows specification of the export format. Supported formats are json
and csv
(default).
Note: The export projection is always flat
.
Parameter | Type | Optional | Default Value | Comment |
---|---|---|---|---|
format | oneOf(csv,json) | x | csv | The desired export format |
lang | oneOf(de) | x | de | The language, currently only DE is supported |
Example:
GET
.../search/372352/export?format=json
- Docker
- Java/Kotlin
- Spring Boot
- Jackson
- Elastic Search API
- JUnit
- Sonar
- SMB Online-Sammlungen API (Hasura)
- SMB Elasticsearch
The code is written in Java and Kotlin. There is no actual restriction on when to use which. The general processing of a request ia a 3-level implementation:
- Controller
- Service
- API
The Controller
receives the request and does all required transformation to pass the data to a Service. In the Service
business logic is applied. For data access and data write operations the Service uses an API
.
To facilitate a clean code structure the processing is always unidirectional; an API never accesses a Service, a Service never accesses a Controller.
The required env-vars are defined in application.yml.
Note: There is an additional application-test.yml that is applied during test execution and (partially) overrides what is defined in application.yml.
The base package of the application is de.smbonline.mdssync
.
Package | Description |
---|---|
de.smbonline.searchindexer |
Application entry point |
de.smbonline.searchindexer.api |
API implementations for Elasticsearch and (Hasura) GraphQl |
de.smbonline.searchindexer.conf |
Runtime configuration incl. configuration wrappers for application.yml |
de.smbonline.searchindexer.dto |
Data transfer objects |
de.smbonline.searchindexer.log |
Logging utils |
de.smbonline.searchindexer.norm |
Normalizer implementations |
de.smbonline.searchindexer.rest |
Controller implementations and utilities shared between Controllers |
de.smbonline.searchindexer.service |
Service implementations |
de.smbonline.searchindexer.util |
Shared utility classes and functions |
- The code is based on @NonNullApi, everything that can have a null value is flagged as @Nullable. (Java)
- All access to member attributes is qualified with
this.
. (Java) - Method arguments are
final
. (Java) - Every code change needs to be reflected by an increased version number in build.gradle.kts.
- We use fully-qualified imports for interfaces and classes.
- We use static wildcard imports for constants and methods from static-only utility classes.
- Developers run
gradlew sonar
before pushing code to Git.