This repository contains code for playlist2vec.com, a website built to demo the vector search model described in our paper, "Representation, Exploration, & Recommendation of Playlists".
- Intuitive search interface and Spotify integration
- Microservices architecture for seamless scalability.
- Docker containers for a streamlined deployment.
- Utilization of low-cost libraries (SQLite and Usearch Vector Search) to minimize app footprint.
- MMAP-based vector search enabling the application to operate efficiently on budget-friendly machines such as Raspberry Pi.
- NGINX configuration for a robust traffic handling.
- Rate limiting implemented to safeguard against DDoS attacks.
- NGINX caching to optimize server resource usage.
- Docker swarm setup with a bash-based DIY autoscaling setup
- Auto-scaling capabilities using Kubernetes.
Note: This setup has been tested on Ubuntu 22.04 for both x86_64 and aarch64 architectures.
- Follow the instructions to install Docker from this link: Install Docker On Ubuntu.
- Make sure to complete the post-installation steps outlined here.
Install Nginx by following the guide available at: How to Install Nginx on Ubuntu 22.04.
You can install Node.js (v20.18.0) by referring to this tutorial: How to install Node.js on Ubuntu 22.04.
Run the following command to clone the repository:
git clone https://github.com/piyp791/playlist2vec.git
Navigate to the project directory by running:
cd playlist2vec
-
Copy the
nginx/nginx.conf
file to/etc/nginx/
. -
Copy the
nginx/site_config
file to/etc/nginx/sites-available/<YOURSITENAME>
. -
Create a symbolic link from
sites-available
tosites-enabled
with the following command:sudo ln -s /etc/nginx/sites-available/<YOURSITENAME> /etc/nginx/sites-enabled/
-
Remove the default configuration file by executing:
sudo rm /etc/nginx/sites-enabled/default
-
Create a cache directory for NGINX:
sudo mkdir /var/cache/nginx
-
Verify the configuration is correct by running:
sudo nginx -t
-
Restart NGINX with the command:
sudo systemctl restart nginx
Ensure that you add the following configuration to the /etc/docker/daemon.json
file (on all machines in a multi-machine cluster setup) to accommodate the local registry needed for Docker images:
{
"insecure-registries": ["<Registry-Host-IP>:5000"]
}
You can get the registry host IP by running the following command:
$(hostname -I | awk '{print $1}'
From within the project directory, execute the build script:
./build.sh.
This script:
- Downloads the resources needed for the application to run.
- Creates a local registry.
- Builds the docker images.
- Pushes the docker images to the registry.
Finally, execute the run script:
./run.sh
This script:
- Copies the website's static resources to the nginx folder.
- Deploys the docker swarm setup using the images built.
This will create an HTTP version of the website, which can be integrated with a service like a Cloudflare tunnel for an HTTPS frontend.
By default, the build script configures the application to use the mini version of the corpus, which contains 377,000 items. In this configuration, the search index for the mini corpus is loaded into memory. In contrast, the search index built with full corpus is accessed in memory-mapped mode.
To change this setting, you can modify the .env
file. Set the IS_MINI
variable to false
to use the full version of the corpus:
IS_MINI=false
cd search-service
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.server:app --port 3001
cd autocomplete-service
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.server:app --port 3002
cd web-server
npm install
npm run start-dev-mode
cd web-server
npm run test-dev
Papreja, P., Venkateswara, H., Panchanathan, S. (2020). Representation, Exploration and Recommendation of Playlists. In: Cellier, P., Driessens, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham. https://doi.org/10.1007/978-3-030-43887-6_50