Reverse-safe data structures for text indexing

Giulia Bernardini, Huiping Chen, Gabriele Fici, Grigorios Loukides, Solon P. Pissis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(nω log d) time, where ω is the matrix multiplication exponent. We show that, despite the nω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

Original languageEnglish
Title of host publication2020 Proceedings of the Symposium on Algorithm Engineering and Experiments, ALENEX 2020
EditorsGuy Blelloch, Irene Finocchi
PublisherSociety for Industrial and Applied Mathematics Publications
Pages199-213
Number of pages15
ISBN (Electronic)9781611976007
DOIs
Publication statusPublished - 2020
Event2020 Symposium on Algorithm Engineering and Experiments, ALENEX 2020 - Salt Lake City, United States
Duration: 5 Jan 20206 Jan 2020

Publication series

NameProceedings of the Workshop on Algorithm Engineering and Experiments
Volume2020-January
ISSN (Print)2164-0300

Conference

Conference2020 Symposium on Algorithm Engineering and Experiments, ALENEX 2020
Country/TerritoryUnited States
CitySalt Lake City
Period5/01/206/01/20

Bibliographical note

Funding Information:
∗GB was partially supported by a research internship at CWI. HC was supported by a CSC scholarship. GF was partially supported by the Italian MIUR project PRIN 2017K7XPAN. †Università di Milano - Bicocca, Italy and CWI, The Netherlands. giulia.bernardini@unimib.it ‡King’s College London, United Kingdom. {huiping.chen,grigorios,loukides}@kcl.ac.uk §Università degli Studi di Palermo, Italy. gabriele.fici@unipa.it ¶CWI, The Netherlands. solon.pissis@cwi.nl

Publisher Copyright:
Copyright © 2020 by SIAM

ASJC Scopus subject areas

  • General Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Reverse-safe data structures for text indexing'. Together they form a unique fingerprint.

Cite this