TY - GEN
T1 - Search-based diverse sampling from real-world software product lines
AU - Xiang, Yi
AU - Huang, Han
AU - Zhou, Yuren
AU - Li, Sizhe
AU - Luo, Chuan
AU - Lin, Qingwei
AU - Li, Miqing
AU - Yang, Xiaowei
PY - 2022/6/20
Y1 - 2022/6/20
N2 - Real-world software product lines (SPLs) often encompass enormous valid configurations that are impossible to enumerate. To understand properties of the space formed by all valid configurations, a feasible way is to select a small, valid and representative sample set. Even though a number of sampling strategies have been proposed, they either fail to produce diverse samples with respect to the number of selected features (an important property to characterize behaviors of configurations), or achieve diverse sampling but with limited scalability (the handleable configuration space size is limited to 1013). To resolve this dilemma, we propose a scalable diverse sampling strategy, which uses a distance metric in combination with the novelty search algorithm to produce diverse samples in an incremental way. The distance metric is carefully designed to measure similarities between configurations, and further diversity of a sample set. The novelty search incrementally improves diversity of samples through the search for novel configurations. We evaluate our sampling algorithm on 39 real-world SPLs. It is able to generate the required number of samples for all the SPLs, including those which can not be counted by sharpSAT, a state-of-the-art model counting solver. Moreover, it performs better than or at least competitively to some state-of-the-art samplers with respect to the diversity of the sample sets. Our results suggest that only the proposed sampler (among all tested ones) achieves scalable diverse sampling.
AB - Real-world software product lines (SPLs) often encompass enormous valid configurations that are impossible to enumerate. To understand properties of the space formed by all valid configurations, a feasible way is to select a small, valid and representative sample set. Even though a number of sampling strategies have been proposed, they either fail to produce diverse samples with respect to the number of selected features (an important property to characterize behaviors of configurations), or achieve diverse sampling but with limited scalability (the handleable configuration space size is limited to 1013). To resolve this dilemma, we propose a scalable diverse sampling strategy, which uses a distance metric in combination with the novelty search algorithm to produce diverse samples in an incremental way. The distance metric is carefully designed to measure similarities between configurations, and further diversity of a sample set. The novelty search incrementally improves diversity of samples through the search for novel configurations. We evaluate our sampling algorithm on 39 real-world SPLs. It is able to generate the required number of samples for all the SPLs, including those which can not be counted by sharpSAT, a state-of-the-art model counting solver. Moreover, it performs better than or at least competitively to some state-of-the-art samplers with respect to the diversity of the sample sets. Our results suggest that only the proposed sampler (among all tested ones) achieves scalable diverse sampling.
KW - Software product lines
KW - distance metric
KW - diverse sampling
KW - novelty search
UR - https://conf.researchr.org/home/icse-2022
UR - https://ieeexplore.ieee.org/xpl/conhome/1000691/all-proceedings
U2 - 10.1145/3510003.3510053
DO - 10.1145/3510003.3510053
M3 - Conference contribution
T3 - International Conference on Software Engineering. Proceedings
SP - 1945
EP - 1957
BT - 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
PB - IEEE
T2 - 44th IEEE/ACM International Conference on Software Engineering
Y2 - 8 May 2022 through 27 May 2022
ER -