Treffer: LLM-Based Natural Language to SPARQL Translation over Domain-Specific Knowledge Graph.
Weitere Informationen
Semantic web applications are witnessing a dramatic increase in complexity, data volume, and usage. Likewise, large language models (LLMs) are experiencing significant developments in performance and capabilities. Consequently, LLMs have been utilized in various fields and applications to support primary and secondary tasks. The proven ability of LLMs to process natural language (NL) has opened the door to integration into many tasks, including NL-related tasks such as Knowledge Graph Question Answering (KGQA), which involves translating NL questions into SPARQL queries to retrieve answers from Knowledge Graphs (KG). However, answering questions over domain-specific KGs is challenging due to complex schema structures, specialized vocabularies, and query complexity. Therefore, the development of domain-agnostic and user-friendly KG querying mechanisms has become necessary. Motivated by this need, this paper presents an LLM based approach for translating NL questions into SPARQL queries over domain-specific KG by investigating how various configurations of augmented KG data influence LLM responses. Our approach adopts a streamlined method for zero-shot SPARQL query generation by augmenting LLMs with different arrangements of previously extracted domain-specific KG information. Specifically, our experiments evaluate LLM generated SPARQL responses against twenty manually crafted questions of varying complexity using prompts augmented with different KG information: first, a reduced linearized KG, and second, discrete vocabulary information extracted from a reduced ontology KG. The results indicate that supplementing LLM prompts with discrete vocabulary information extracted from a reduced KG ontology yields competitive performance levels for the target LLM models compared to supplementing them with a reduced ontology. Ultimately, our approach reduces the augmented KG information size while preserving response accuracy, enables off-domain users to interact with domain-specific KG information and retrieve responses through a domain-agnostic interface, and facilitates benchmarking over a wide spectrum of LLM models. [ABSTRACT FROM AUTHOR]