Treffer: Distributed Asynchronous Regular Path Queries (RPQs) on Graphs

Title:
Distributed Asynchronous Regular Path Queries (RPQs) on Graphs
Contributors:
Well Honed Infrastructure Software for Programming Environments and Runtimes (Whisper), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Oracle Labs, Faculty of Mathematics and Physics Charles University of Praha, Univerzita Karlova Praha, Česká republika = Charles University Prague, Czech Republic (UK)
Source:
Middleware Industrial Track '23: Proceedings of the 24th International Middleware Conference Industrial Track ; Middleware 2023: 24th International Middleware Conference ; https://inria.hal.science/hal-04355309 ; Middleware 2023: 24th International Middleware Conference, Dec 2023, Bologna, Italy, Italy. pp.35-41, ⟨10.1145/3626562.3626833⟩
Publisher Information:
HAL CCSD
ACM
Publication Year:
2023
Collection:
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
Subject Geographic:
Document Type:
Konferenz conference object
Language:
English
DOI:
10.1145/3626562.3626833
Rights:
http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.B4CBAA75
Database:
BASE

Weitere Informationen

International audience ; Graph engines play a crucial role in modern data analytics pipelines, serving as a middleware for handling complex queries across various domains, such as financial fraud detection. Graph queries enable flexible exploration and analysis, akin to SQL in relational databases. Among the most expressive and powerful constructs of graph querying are regular path queries (RPQs). RPQs enable support for variable-length path patterns based on regular expressions, such as (p1:Person)-/:Knows+/->(p2:Person) that searches for non-empty paths of any length between two persons. In this paper, we introduce a novel design for distributed RPQs that builds on top of distributed asynchronous pipelined traversals to enable (i) memory control of path explorations, with (ii) great performance and scalability. Through our evaluation, we show that with sixteen machines, it outperforms Neo4j by 91× on average and a relational implementation of the same queries in PostgreSQL by 230×, while maintaining low memory consumption. CCS Concepts • Information systems → Parallel and distributed DBMSs; Main memory engines; Database query processing; Graph-based database models; DBMS engine architectures.