Pseudo-Label Generation for Multi-Label Text Classification

Metadata Updated: May 2, 2019

With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Access & Use Information

Public: This dataset is intended for public access and use. License: U.S. Government Work

Downloads & Resources

Dates

Metadata Created Date August 1, 2018
Metadata Updated Date May 2, 2019
Data Update Frequency irregular

Metadata Source

Harvested from NASA Data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date August 1, 2018
Metadata Updated Date May 2, 2019
Publisher Dashlink
Unique Identifier DASHLINK_679
Maintainer
Nikunj Oza
Maintainer Email
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://data.nasa.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Datagov Dedupe Retained 20190501230127
Harvest Object Id 5e613523-87fb-4736-8a0e-5f0286ab3fdf
Harvest Source Id 39e4ad2a-47ca-4507-8258-852babd0fd99
Harvest Source Title NASA Data.json
Data First Published 2013-03-28
Homepage URL https://c3.nasa.gov/dashlink/resources/679/
License http://www.usa.gov/publicdomain/label/1.0/
Data Last Modified 2018-07-19
Program Code 026:029
Source Datajson Identifier True
Source Hash aaf7e64927c43e6e076182fed36da4339cfe8598
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.