Abstract
Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
Original language | English |
---|---|
Title of host publication | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 |
Editors | Inge Li Gortz, Oren Weimann |
Publisher | Schloss Dagstuhl |
ISBN (Electronic) | 9783959771498 |
DOIs | |
Publication status | Published - 1 Jun 2020 |
Event | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark Duration: 17 Jun 2020 → 19 Jun 2020 |
Publication series
Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 161 |
ISSN (Print) | 1868-8969 |
Conference
Conference | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 |
---|---|
Country/Territory | Denmark |
City | Copenhagen |
Period | 17/06/20 → 19/06/20 |
Bibliographical note
Funding Information:Funding This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539. Giulia Bernardini: Partially supported by a research internship at CWI. Huiping Chen: Supported by a Ph.D scholarship from China Scholarship Council. Leen Stougie: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003. Michelle Sweering: Supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003.
Publisher Copyright:
© 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.
Keywords
- Conditional lower bound
- Data sanitization
- Dynamic programming
- Edit distance
- String algorithms
ASJC Scopus subject areas
- Software