All notable changes to the OSMH Consumer Indexer will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

For each release, use the following sub-sections: - Added (for new features) - Changed (for changes in existing functionality) - Deprecated (for soon-to-be removed features) - Removed (for now removed features) - Fixed (for any bug fixes) - Security (in case of vulnerabilities)

[2.3.1] - 2021-02-11



  • Add ADP to the list of harvested endpoints (#201)


  • Included DANS twice with different metadata parameters to pick up English and Dutch study versions (#280)
  • Improved the debug logging of studies dropped for having no languages with the minimum required fields (#278)

[2.3.0] - 2021-02-09



  • Add HTTP compression to the repository handler (#167)
  • Add Code of Conduct file (#174)
  • Add new PROGEDO endpoint (#177)
  • Harvest each repository endpoint with a dedicated thread (#178, #225)
  • Add SODA endpoint (#190)
  • Add option to set default language as part of endpoint specification (#192)
  • Add more details to 'Configured Repos' log output (#195)
  • Add code as an additional field in the indexer model (#199)
  • Add ADP Kuha2 Endpoint (#201)
  • Add stopwords for Hungarian and Portuguese language analysers (#204)
  • Improve the logging of remote repository handlers (#207)
  • Implement a country filter so that only countries with ISO country codes are accepted (#214)
  • Delete inactive records from Elasticsearch (#217)
  • Add a run_type variable to the logs to distinguish different types of harvester runs (#227)


  • Remove "not available" if no PID agency is present (#156)
  • Revise XML Schema Definition to ensure compliance with system implementation (#59)
  • Search Optimisation (#131)
  • Remove (not available) if no PID agency (#156)
  • Modify Harvester to output Required logs (#159)
  • Disable access to external XML entities in the repository handlers (#176)
  • Log statistics for created, deleted and updated studies (#181)
  • Cleaning Publisher filter (#183)
  • Update Elasticsearch to 5.6 (#188)
  • Support Spring Boot Admin 2 for metrics and remote management (#191, #211)
  • Add more details to 'Configured Repos' log output (#194)
  • Change SODA publisher name (#197)
  • Update SND set spec (#200)
  • Refine the list of fields to be indexed (#238)
  • Map langAvailableIn as a keyword, so that it can be used for sorting and filtering (#241)
  • Add a search field for country metadata (#252)


  • Set the study url field from any language before replacing it with the language specific element (#142)
  • Fix alphabetical sorting issues caused by not normalising upper and lower case letters (#171)
  • Fix rejection reason not showing in the logs (#184)
  • Cleanup code (#203)
  • Fix title ascending/descending sort options not functioning (#209)

[2.2.1] - 2020-05-04

OSMH Consumer Indexer - 10.5281/zenodo.3786356


  • new GESIS endpoint (#162)
  • file appender
  • format error log message for successful indexing
  • implemented correlation id using MDC.putClosable
  • correlation ID to the log messages
  • dependency for JSON logging support (logstash-logback-encoder 5.2)


  • changed GESIS endpoint from HTTP to HTTPS (#162)
  • use Java Time APIs for the PerfRequestSyncInterceptor stopwatch
  • increased test coverage
  • updated SonarQube scanner to 3.7.0
  • updated Spring Boot to 1.5.21
  • unified timeout and SSL verification settings
  • refined error log for unsuccessful indexing
  • marked all utility classes as final
  • close the Elasticsearch client on shutdown
  • revised and re-ordered list of endpoints
  • use Jib to containerise the indexer
  • updated Maven wrapper to 0.5.3
  • refactored the error handling code in DaoBase.postForStringResponse to better align with Java best practices
  • refactored exception handling to avoid catching RuntimeException and a cast
  • print the config in StatusService.printPaSCHandlerOaiPmhConfig() directly
  • change behaviour when Study PID Agency is not specified. Before: '10.5279/DK-SA-DDA-868 (not available)'. After: '10.5279/DK-SA-DDA-868 (Agency not available)' (#156)
  • log queries at the info level
  • moved recursion out of the try-with-resources block to reduce resource consumption
  • reformatted the message when the record headers could not be parsed (because the parser could have failed at any point and left the InputStream in an inconsistent state)
  • use input streams instead of strings (avoids a double copy)
  • renamed dev profile to gcp
  • improved logging to help determine quality of harvested metadata (#191)


  • N/A


  • caches of RuntimeException in ESIngestService
  • option to disable HTTPS verification
  • unnecessary null check


  • compiler warnings, as recommended by Error Prone
  • time zone bugs
  • logging pattern for the file logger
  • unused micrometer dependencies
  • unused DocumentBuilder bean
  • issues reported by SonarQube
  • register DocumentBuilderFactories as beans instead of DocumentBuilders. DocumentBuilders are not thread safe and need resetting after use. DocumentBuilderFactory.createDocumentBuilder() is thread safe and should be used instead
  • fixed logs not showing in Spring Boot Admin
  • encoded the resumption token in case characters invalid for URIs are returned
  • time zone bugs


  • verify SSL
  • removed the option to disable HTTPS verification