GarfieldRetrieve

Project and Project Work in Text Mining, Data Mining and Big Data Analytics

Author: Enrico Benedetti

Email: enrico.benedetti5 [at] studio.unibo.it

What's this

This is a project done for the course "Text Mining, Data Mining and Big Data Analytics" at my university. It consists of the cleaning of a new dataset containing based on the ever-updating file containing transcripts of Garfield Strips. Only raw text and no speech bubble annotation unfortunately. Maybe this could be done using computer vision techniques.

In the notebook I perform:

dataset creation and processing
a short analysis of the text data
unsupervised topic mining using latent semantic analysis
building of a comic strip retrieval system based on deep metric learning (unsupervised) w/ sentence transformers.
comparison of some approaches (latent semantic index baseline, pre-trained ST, fine-tuned ST w/ triplet loss, fine-tuned ST w/ multiple negatives loss).

Feel free to use this code to build a Garfield search engine or whatever!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
GarfieldRetrieve.ipynb		GarfieldRetrieve.ipynb
README.md		README.md
UNLICENSE.md		UNLICENSE.md
garfield github.url		garfield github.url
garfield.txt		garfield.txt
garfield_finetuned_mn_state		garfield_finetuned_mn_state
garfield_finetuned_triplet_state		garfield_finetuned_triplet_state

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GarfieldRetrieve

Project and Project Work in Text Mining, Data Mining and Big Data Analytics

What's this

About

Releases

Packages

Languages

License

EnricoBenedetti/GarfieldRetrieve

Folders and files

Latest commit

History

Repository files navigation

GarfieldRetrieve

Project and Project Work in Text Mining, Data Mining and Big Data Analytics

What's this

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages