Pseudo-Label Generation for Multi-Label Text Classification

Metadata Updated: December 6, 2023

With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

ahkh11.pdfPDF
ahkh11.pdf

Download

Landing PageLanding Page

Visit page

Dates

Metadata Created Date	November 12, 2020
Metadata Updated Date	December 6, 2023
Data Update Frequency	irregular

Metadata Source

Data.json Data.json Metadata
Download Metadata

Harvested from NASA Data.json

Additional Metadata

Resource Type	Dataset
Metadata Created Date	November 12, 2020
Metadata Updated Date	December 6, 2023
Publisher	Dashlink
Maintainer	Nikunj Oza
Identifier	DASHLINK_679
Data First Published	2013-03-28
Data Last Modified	2020-01-29
Public Access Level	public
Data Update Frequency	irregular
Bureau Code	026:00
Metadata Context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID	https://data.nasa.gov/data.json
Schema Version	https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby	https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id	e7fd1d9a-719b-4659-af81-67159e02b2be
Harvest Source Id	58f92550-7a01-4f00-b1b2-8dc953bd598f
Harvest Source Title	NASA Data.json
Homepage URL	https://c3.nasa.gov/dashlink/resources/679/
Program Code	026:029
Source Datajson Identifier	True
Source Hash	d067f566c6c17933825b3319f5cc76f2adc31de0ec5dd0254a8add88783feed3
Source Schema Version	1.1

Didn't find what you're looking for? Suggest a dataset here.

Data Catalog

Pseudo-Label Generation for Multi-Label Text Classification

Access & Use Information

Downloads & Resources

Dates

Metadata Source

Other Data Resources

Additional Metadata

Success

Error