🎹 🪘 🎷 🎺 🪗 🪕 🎻
Two of my passions are music and data! I realized I had a bounty of metadata from artists I've listened to over the past several years and I decided to take advantage to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from Genius, added some other selected top artists, did a ton of post-processing and trained a GPT-2's based model from scratch using the AITextGen framework. The UI / back end is built in Streamlit The vocabulary was built from scratch, rather than fine-tuned off an existing model. I also fine-tuned a GPT-2 based model available here but this model weighs in at a fraction of the size.
A demo is available here Generation is resource intense and can be slow in the demo. I set governors on song length to keep generation time somewhat reasonable. You may adjust song length and other parameters on the left or check out Github to spin up your own Rockbot.
Data Prep Cleaning Notes:
- Removed duplicate lyrics from each song
- Deduped similar songs based on overall similarity to remove cover versions
- Removed as much noise / junk as possible. There is still some.
- Added tokens to delineate song
- Used language to remove non-English versions of songs
- Many others!
- Python.
- Streamlit.
- GPT-2.
- AITextGen.
- LyricsGenius (retrieving lyrics for training).
- Knime (data cleaning and post processing)
- GPT-2 generation
Please refer to AITextGen and Huggingface for much better documentation.
Generate With Prompt (Use lower case for Song Name, First Line):
Song Name
BY
Artist Name (Use unmodified from [Github](https://github.com/bigjoedata/rockbot/blob/main/theartists.parquet)
Beginning of song
Running your own is very easy. Visit my Streamlit-Plus repository for more details on the image build
- Install Docker Compose
- Follow the following steps
git clone https://github.com/bigjoedata/rockbot
cd rockbot
nano docker-compose.yml # Edit environmental variables for max song length and max songs to generate to match your computing power (higher is more resource intensive)
docker-compose up -d # launch in daemon (background) mode