A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction

Liyan Song; Leandro Minku; Cong Teng; Xin Yao

doi:10.1145/3611643.3616307

A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction

Liyan Song, Leandro Minku^*, Cong Teng, Xin Yao^*

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Downloads (Pure)

Abstract

Just-in-Time Software Defect Prediction (JIT-SDP) can be seen as an online learning problem where additional software changes produced over time may be labeled and used to create training examples. These training examples form a data stream that can be used to update JITSDP models in an attempt to avoid models becoming obsolete and poorly performing. However, labeling procedures adopted in existing online JIT-SDP studies implicitly assume that practitioners would not inspect software changes upon a defect-inducing prediction, delaying the production of training examples. This is inconsistent with a realworld scenario where practitioners would adopt JIT-SDP models and inspect certain software changes predicted as defect-inducing to check whether they really induce defects. Such inspection means that some software changes would be labeled much earlier than assumed in existing work, potentially leading to different JIT-SDP models and performance results. This paper aims at formulating a more practical human labeling procedure that takes into account the adoption of JIT-SDP models during the software development process. It then analyses whether and to what extent it would impact the predictive performance of JIT-SDP models. We also propose a new method to target the labeling of software changes with the aim of saving human inspection effort. Experiments based on 14 GitHub projects revealed that adopting a more realistic labeling procedure led to significantly higher predictive performance than when delaying the labeling process, meaning that existing work may have been underestimating the performance of JIT-SDP. In addition, our proposed method to target the labeling process was able to reduce human effort while maintaining predictive performance by recommending practitioners to inspect software changes that are more likely to induce defects. We encourage the adoption of more realistic human labeling methods in research studies to obtain an evaluation of JIT-SDP predictive performance that is closer to reality.

Original language	English
Title of host publication	ESEC/FSE 2023
Subtitle of host publication	Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Editors	Satish Chandra, Kelly Blincoe, Paolo Tonella
Publisher	Association for Computing Machinery (ACM)
Pages	605–617
Number of pages	13
ISBN (Electronic)	9798400703270
DOIs	https://doi.org/10.1145/3611643.3616307
Publication status	Published - 30 Nov 2023
Event	31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering - San Francisco, United States Duration: 3 Dec 2023 → 9 Dec 2023

Publication series

Name	FSE: Foundations of Software Engineering

Conference

Conference	31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Abbreviated title	ESEC/FSE '23
Country/Territory	United States
City	San Francisco
Period	3/12/23 → 9/12/23

Bibliographical note

Acknowledgments:
This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 62002148 and 62250710682, the Program for Guangdong Introducing Innovative and Enterpreneurial Teams under Grant No. 2017ZT07X386, Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and Research Institute of Trustworthy Autonomous Systems (RITAS).

Keywords

Just-in-time software defect prediction
online learning
verification latency
waiting time
human labeling
human inspection

Access to Document

10.1145/3611643.3616307Licence: Creative Commons: Attribution (CC BY)

SongL2023PracticalFinal published version, 1.09 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

Song, L., Minku, L., Teng, C., & Yao, X. (2023). A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction. In S. Chandra, K. Blincoe, & P. Tonella (Eds.), ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 605–617). (FSE: Foundations of Software Engineering). Association for Computing Machinery (ACM). https://doi.org/10.1145/3611643.3616307

Song, Liyan ; Minku, Leandro ; Teng, Cong et al. / A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction. ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. editor / Satish Chandra ; Kelly Blincoe ; Paolo Tonella. Association for Computing Machinery (ACM), 2023. pp. 605–617 (FSE: Foundations of Software Engineering).

@inproceedings{b3f84075d8db40c48875de0398abc39c,

title = "A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction",

abstract = "Just-in-Time Software Defect Prediction (JIT-SDP) can be seen as an online learning problem where additional software changes produced over time may be labeled and used to create training examples. These training examples form a data stream that can be used to update JITSDP models in an attempt to avoid models becoming obsolete and poorly performing. However, labeling procedures adopted in existing online JIT-SDP studies implicitly assume that practitioners would not inspect software changes upon a defect-inducing prediction, delaying the production of training examples. This is inconsistent with a realworld scenario where practitioners would adopt JIT-SDP models and inspect certain software changes predicted as defect-inducing to check whether they really induce defects. Such inspection means that some software changes would be labeled much earlier than assumed in existing work, potentially leading to different JIT-SDP models and performance results. This paper aims at formulating a more practical human labeling procedure that takes into account the adoption of JIT-SDP models during the software development process. It then analyses whether and to what extent it would impact the predictive performance of JIT-SDP models. We also propose a new method to target the labeling of software changes with the aim of saving human inspection effort. Experiments based on 14 GitHub projects revealed that adopting a more realistic labeling procedure led to significantly higher predictive performance than when delaying the labeling process, meaning that existing work may have been underestimating the performance of JIT-SDP. In addition, our proposed method to target the labeling process was able to reduce human effort while maintaining predictive performance by recommending practitioners to inspect software changes that are more likely to induce defects. We encourage the adoption of more realistic human labeling methods in research studies to obtain an evaluation of JIT-SDP predictive performance that is closer to reality. ",

keywords = "Just-in-time software defect prediction, online learning, verification latency, waiting time, human labeling, human inspection",

author = "Liyan Song and Leandro Minku and Cong Teng and Xin Yao",

note = "Acknowledgments: This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 62002148 and 62250710682, the Program for Guangdong Introducing Innovative and Enterpreneurial Teams under Grant No. 2017ZT07X386, Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and Research Institute of Trustworthy Autonomous Systems (RITAS). ; 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE '23 ; Conference date: 03-12-2023 Through 09-12-2023",

year = "2023",

month = nov,

day = "30",

doi = "10.1145/3611643.3616307",

language = "English",

series = "FSE: Foundations of Software Engineering",

publisher = "Association for Computing Machinery (ACM)",

pages = "605–617",

editor = "Satish Chandra and Kelly Blincoe and Paolo Tonella",

booktitle = "ESEC/FSE 2023",

address = "United States",

}

Song, L, Minku, L, Teng, C & Yao, X 2023, A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction. in S Chandra, K Blincoe & P Tonella (eds), ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. FSE: Foundations of Software Engineering, Association for Computing Machinery (ACM), pp. 605–617, 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, California, United States, 3/12/23. https://doi.org/10.1145/3611643.3616307

A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction. / Song, Liyan; Minku, Leandro; Teng, Cong et al.
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ed. / Satish Chandra; Kelly Blincoe; Paolo Tonella. Association for Computing Machinery (ACM), 2023. p. 605–617 (FSE: Foundations of Software Engineering).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction

AU - Song, Liyan

AU - Minku, Leandro

AU - Teng, Cong

AU - Yao, Xin

N1 - Acknowledgments: This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 62002148 and 62250710682, the Program for Guangdong Introducing Innovative and Enterpreneurial Teams under Grant No. 2017ZT07X386, Guangdong Provincial Key Laboratory under Grant No. 2020B121201001 and Research Institute of Trustworthy Autonomous Systems (RITAS).

PY - 2023/11/30

Y1 - 2023/11/30

N2 - Just-in-Time Software Defect Prediction (JIT-SDP) can be seen as an online learning problem where additional software changes produced over time may be labeled and used to create training examples. These training examples form a data stream that can be used to update JITSDP models in an attempt to avoid models becoming obsolete and poorly performing. However, labeling procedures adopted in existing online JIT-SDP studies implicitly assume that practitioners would not inspect software changes upon a defect-inducing prediction, delaying the production of training examples. This is inconsistent with a realworld scenario where practitioners would adopt JIT-SDP models and inspect certain software changes predicted as defect-inducing to check whether they really induce defects. Such inspection means that some software changes would be labeled much earlier than assumed in existing work, potentially leading to different JIT-SDP models and performance results. This paper aims at formulating a more practical human labeling procedure that takes into account the adoption of JIT-SDP models during the software development process. It then analyses whether and to what extent it would impact the predictive performance of JIT-SDP models. We also propose a new method to target the labeling of software changes with the aim of saving human inspection effort. Experiments based on 14 GitHub projects revealed that adopting a more realistic labeling procedure led to significantly higher predictive performance than when delaying the labeling process, meaning that existing work may have been underestimating the performance of JIT-SDP. In addition, our proposed method to target the labeling process was able to reduce human effort while maintaining predictive performance by recommending practitioners to inspect software changes that are more likely to induce defects. We encourage the adoption of more realistic human labeling methods in research studies to obtain an evaluation of JIT-SDP predictive performance that is closer to reality.

AB - Just-in-Time Software Defect Prediction (JIT-SDP) can be seen as an online learning problem where additional software changes produced over time may be labeled and used to create training examples. These training examples form a data stream that can be used to update JITSDP models in an attempt to avoid models becoming obsolete and poorly performing. However, labeling procedures adopted in existing online JIT-SDP studies implicitly assume that practitioners would not inspect software changes upon a defect-inducing prediction, delaying the production of training examples. This is inconsistent with a realworld scenario where practitioners would adopt JIT-SDP models and inspect certain software changes predicted as defect-inducing to check whether they really induce defects. Such inspection means that some software changes would be labeled much earlier than assumed in existing work, potentially leading to different JIT-SDP models and performance results. This paper aims at formulating a more practical human labeling procedure that takes into account the adoption of JIT-SDP models during the software development process. It then analyses whether and to what extent it would impact the predictive performance of JIT-SDP models. We also propose a new method to target the labeling of software changes with the aim of saving human inspection effort. Experiments based on 14 GitHub projects revealed that adopting a more realistic labeling procedure led to significantly higher predictive performance than when delaying the labeling process, meaning that existing work may have been underestimating the performance of JIT-SDP. In addition, our proposed method to target the labeling process was able to reduce human effort while maintaining predictive performance by recommending practitioners to inspect software changes that are more likely to induce defects. We encourage the adoption of more realistic human labeling methods in research studies to obtain an evaluation of JIT-SDP predictive performance that is closer to reality.

KW - Just-in-time software defect prediction

KW - online learning

KW - verification latency

KW - waiting time

KW - human labeling

KW - human inspection

U2 - 10.1145/3611643.3616307

DO - 10.1145/3611643.3616307

M3 - Conference contribution

T3 - FSE: Foundations of Software Engineering

SP - 605

EP - 617

BT - ESEC/FSE 2023

A2 - Chandra, Satish

A2 - Blincoe, Kelly

A2 - Tonella, Paolo

PB - Association for Computing Machinery (ACM)

T2 - 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Y2 - 3 December 2023 through 9 December 2023

ER -

Song L, Minku L, Teng C, Yao X. A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction. In Chandra S, Blincoe K, Tonella P, editors, ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery (ACM). 2023. p. 605–617. (FSE: Foundations of Software Engineering). doi: 10.1145/3611643.3616307

A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this