Knowledge-driven graph similarity for text classification

Niloofer Shanavas; Hui Wang; Zhiwei Lin; Glenn Hawe

doi:10.1007/s13042-020-01221-4

Knowledge-driven graph similarity for text classification

Niloofer Shanavas^*, Hui Wang, Zhiwei Lin, Glenn Hawe

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Article › peer-review

25 Downloads (Pure)

Abstract

Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.

Original language	English
Pages (from-to)	1067–1081
Number of pages	15
Journal	International Journal of Machine Learning and Cybernetics
Volume	12
Early online date	19 Nov 2020
DOIs	https://doi.org/10.1007/s13042-020-01221-4
Publication status	Published - Apr 2021

Keywords

Automatic text classification
Document similarity measure
Graph-based text representation
Graph enrichment
Graph kernels
Supervised term weighting
SVM

Access to Document

10.1007/s13042-020-01221-4Licence: Creative Commons: Attribution (CC BY)

ShanavasN2020KnowledgeFinal published version, 2.06 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{73dd1997fd4a4265a3ee2e6684a165df,

title = "Knowledge-driven graph similarity for text classification",

abstract = "Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.",

keywords = "Automatic text classification, Document similarity measure, Graph-based text representation, Graph enrichment, Graph kernels, Supervised term weighting, SVM",

author = "Niloofer Shanavas and Hui Wang and Zhiwei Lin and Glenn Hawe",

year = "2021",

month = apr,

doi = "10.1007/s13042-020-01221-4",

language = "English",

volume = "12",

pages = "1067–1081",

journal = "International Journal of Machine Learning and Cybernetics",

}

TY - JOUR

T1 - Knowledge-driven graph similarity for text classification

AU - Shanavas, Niloofer

AU - Wang, Hui

AU - Lin, Zhiwei

AU - Hawe, Glenn

PY - 2021/4

Y1 - 2021/4

N2 - Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.

AB - Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.

KW - Automatic text classification

KW - Document similarity measure

KW - Graph-based text representation

KW - Graph enrichment

KW - Graph kernels

KW - Supervised term weighting

KW - SVM

U2 - 10.1007/s13042-020-01221-4

DO - 10.1007/s13042-020-01221-4

M3 - Article

VL - 12

SP - 1067

EP - 1081

JO - International Journal of Machine Learning and Cybernetics

JF - International Journal of Machine Learning and Cybernetics

ER -

Knowledge-driven graph similarity for text classification

Abstract

Keywords

Access to Document

Fingerprint

Cite this