Research

A Method for Ontology-aware Search and Analytics using Elasticsearch

 2024.9.30.

In recent years, traditional information retrieval (IR) based on literal matching has been extended to semantic search that targets semantic matching.

The methods for semantic search can be broadly divided into two categories: query processing and semantic indexing. Query processing performs various semantic manipulations to the user's query (one or more terms), while semantic indexing adds some semantic information to the data to search. Query processing includes query expansion, query refinement, query disambiguation, but the most widely used is query expansion, for that proposed are corpus-based method, ontology-based method, etc. On the other hand, semantic indexing methods include adding semantic information to each term extracted from ontologies such as WordNet, obtaining relevant terms for terms and indexing them together, etc.

Query processing method is relatively simple to implement, but has the disadvantage that cannot fully reflect the semantic knowledge because they do not process the raw data. And the semantic indexing methods have the limitation that the retrieval results are highly dependent on the knowledge base that id used and not all knowledge can be considered in practice. Therefore, it is more reasonable to use a combination of the two methods to further improve the semantic relevance of the retrieval.

Elasticsearch is a Lucene-based distributed real-time search engine, which focuses on full-text retrieval, but also provides rich functionality for structured data search and supports various analytics functions. Its main feature is its ease of use, much faster search speed than traditional databases, and the ability to perform real-time or near-real-time on large volumes of data. The search results can be displayed using Kibana provided together with it or using a self-made interface.

In recent years, many of the various IR systems have been using Elasticsearch, most of which mainly use text retrieval and some studies have used structured search and analytics functions. There are also studies that have been applied to image retrieval or semantic labeling of text data, but not to semantic search, but to exploit its high text retrieval functionality.

To efficiently respond to the diverse requirements of the vast data generated in IoT systems, it is reasonable to be based on a distributed fast search engine such as Elasticsearch and to add the additional semantic information to it.

To realize semantic search of big data, Laboratory of IoT Applications, IoT Technology Institute, Faculty of Information Science, Kim Il Sung University developed a method for implementing ontology-aware search and analytics using Elasticsearch. By using a combination of semantic indexing and query expansion approaches, it was enabled to perform search and analytics considering semantic relations such as hierarchical relations for both structured and textual data. In our experiments, our model achieved an average 2.6-fold speedup over baseline models for structured search and analytics queries.

The result of this study was published in the international conference 《2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)》(India) under the title of 《Ontology-aware Search and Analytics with Elasticsearch: Case study for Epidemiological Investigation》(https://doi.org/10.1109/ICACITE57410.2023.10182931).