UTME: Unsupervised Taxonomy Mapping and Expansion for Document Classification

Document classification within a custom internal hierarchical taxonomy is a prevalent challenge for organizations dealing with textual data. Traditional approaches rely on supervised techniques, effective on specific datasets but constrained by the need for extensive corpora of annotated documents. Furthermore, these models lack direct applicability to different taxonomies. In this repository, we contribute to this issue by introducing a methodology for classifying text within a custom hierarchical taxonomy entirely in the absence of labeled data. Our approach incorporates unsupervised taxonomy mapping for first-level document classification, taxonomy and unsupervised taxonomy expansion for dynamic adaptation to evolving content.

Key Features

Unsupervised Taxonomy Mapping:
- Classifies documents within a custom user-defined hierarchical taxonomy without the need for labeled data.
Unsupervised Taxonomy Expansion:
- Expands and adapts taxonomies in an unsupervised manner.
- Utilizes document content to identify and generate new subcategories dynamically.
Graph-Based Document Relationships:
- Constructs a graph based on document similarity for exploratory visual analysis and data sampling.

Getting Started

Read our tutorials to see examples of the UTME in action:

UTME for Hate Speech Analsys: This tutorial explores the motivation behind using UTME and the importance of automated mining and monitoring of hate speech texts.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tutoriais/hatespeech		tutoriais/hatespeech
utme		utme
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UTME: Unsupervised Taxonomy Mapping and Expansion for Document Classification

Key Features

Getting Started

About

Releases

Packages

Languages

License

Labic-ICMC-USP/UTME

Folders and files

Latest commit

History

Repository files navigation

UTME: Unsupervised Taxonomy Mapping and Expansion for Document Classification

Key Features

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages