String Sanitization under Edit Distance

Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Leen Stougie, Michelle Sweering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

Original languageEnglish
Title of host publication31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
EditorsInge Li Gortz, Oren Weimann
PublisherSchloss Dagstuhl
ISBN (Electronic)9783959771498
DOIs
Publication statusPublished - 1 Jun 2020
Event31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark
Duration: 17 Jun 202019 Jun 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume161
ISSN (Print)1868-8969

Conference

Conference31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
Country/TerritoryDenmark
CityCopenhagen
Period17/06/2019/06/20

Bibliographical note

Funding Information:
Funding This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539. Giulia Bernardini: Partially supported by a research internship at CWI. Huiping Chen: Supported by a Ph.D scholarship from China Scholarship Council. Leen Stougie: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003. Michelle Sweering: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003.

Publisher Copyright:
© 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

Keywords

  • Conditional lower bound
  • Data sanitization
  • Dynamic programming
  • Edit distance
  • String algorithms

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'String Sanitization under Edit Distance'. Together they form a unique fingerprint.

Cite this