TY - JOUR
T1 - Identifying new potential biomarkers in adrenocortical tumors based on mRNA expression data using machine learning
AU - Marquardt, Andre
AU - Landwehr, Laura-Sophie
AU - Ronchi, Cristina
AU - Dalmazi, Guido Di
AU - Riester, Anna
AU - Kollmannsberger, Philip
AU - Altieri, Barbara
AU - Fassnacht, Martin
AU - Sbiera, Silviu
PY - 2021/9/17
Y1 - 2021/9/17
N2 - Adrenocortical carcinoma (ACC) is a rare disease, associated with poor survival. Several “multiple-omics” studies characterizing ACC on a molecular level identified two different clusters correlating with patient survival (C1A and C1B). We here used the publicly available transcriptome data from the TCGA-ACC dataset (n = 79), applying machine learning (ML) methods to classify the ACC based on expression pattern in an unbiased manner. UMAP (uniform manifold approximation and projection)-based clustering resulted in two distinct groups, ACC-UMAP1 and ACC-UMAP2, that largely overlap with clusters C1B and C1A, respectively. However, subsequent use of random-forest-based learning revealed a set of new possible marker genes showing significant differential expression in the described clusters (e.g., SOAT1, EIF2A1). For validation purposes, we used a secondary dataset based on a previous study from our group, consisting of 4 normal adrenal glands and 52 benign and 7 malignant tumor samples. The results largely confirmed those obtained for the TCGA-ACC cohort. In addition, the ENSAT dataset showed a correlation between benign adrenocortical tumors and the good prognosis ACC cluster ACC-UMAP1/C1B. In conclusion, the use of ML approaches re-identified and redefined known prognostic ACC subgroups. On the other hand, the subsequent use of random-forest-based learning identified new possible prognostic marker genes for ACC.
AB - Adrenocortical carcinoma (ACC) is a rare disease, associated with poor survival. Several “multiple-omics” studies characterizing ACC on a molecular level identified two different clusters correlating with patient survival (C1A and C1B). We here used the publicly available transcriptome data from the TCGA-ACC dataset (n = 79), applying machine learning (ML) methods to classify the ACC based on expression pattern in an unbiased manner. UMAP (uniform manifold approximation and projection)-based clustering resulted in two distinct groups, ACC-UMAP1 and ACC-UMAP2, that largely overlap with clusters C1B and C1A, respectively. However, subsequent use of random-forest-based learning revealed a set of new possible marker genes showing significant differential expression in the described clusters (e.g., SOAT1, EIF2A1). For validation purposes, we used a secondary dataset based on a previous study from our group, consisting of 4 normal adrenal glands and 52 benign and 7 malignant tumor samples. The results largely confirmed those obtained for the TCGA-ACC cohort. In addition, the ENSAT dataset showed a correlation between benign adrenocortical tumors and the good prognosis ACC cluster ACC-UMAP1/C1B. In conclusion, the use of ML approaches re-identified and redefined known prognostic ACC subgroups. On the other hand, the subsequent use of random-forest-based learning identified new possible prognostic marker genes for ACC.
KW - adrenocortical carcinoma
KW - bioinformatic clustering
KW - biomarker prediction
KW - in silico analysis
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85115089873&partnerID=8YFLogxK
U2 - 10.3390/cancers13184671
DO - 10.3390/cancers13184671
M3 - Article
SN - 2072-6694
VL - 13
JO - Cancers
JF - Cancers
IS - 18
M1 - 4671
ER -