Table of Contents
This project is the code of paper "Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects". In this project, we aim to provide two voice-based conversational recommender systems datasets in the e-commerce and movie domains.
You can download datasets from GoogleDrive. The datasets consist of two parts: coat.tar.gz and ml-1m.tar.gz
The data file is formatted as a mp3 file and the file name form is diaidxx_uidxx_iidxx_xx_xx_xx.mp3
.
For example, for file diaid21_uid249_iid35_20-30_men_251.mp3
, its meaning is as follows:
diaid21: corresponds to dialogue 21 in the text-based conversation dataset
uid249: user id is 249
iid35: item id is 35
20-30: user's age is between 20 and 30
men: user's gender is male
251: corresponds to speaker p251 in vctk dataset
Speaker information on the vctk dataset can be found here
Here we provide a demo of a data file (i.e., diaid21_uid249_iid35_20-30_men_251.mp3
) that contains text and audio dialogue between the user and the agent.
demov1.mp4
Note that since we currently only explore the impact of speech on VCRS from the user's perspective, only the user's speech is included in the provided dataset. If you want complete dialogue audio, you can generate it through the code we provide.
We propose to extract explicit semantic features from the voice data and then incorporate them into the recommendation model in a two-phase fusion manner.
Please refer to here for how to run the code.
Our VCRSs dataset creation task includes four steps: (1) backbone dataset selection; (2)text-based conversation generation; (3) voice-based conversation generation; and (4) quality evaluation.
We choose Coat and ML-1M as our backbone datasets. Using user-item interactions and item features to simulate a text-based conversation between users and agents for recommendation and using user features to assign proper speakers.
Please refer to here for how to generate the text-based conversation and the code is in ./Dialogue/
directory.
Please refer to here for how to generate the voice-based conversation.
We adopt the fine-grained evaluation of dialogue (FED) metric to measure the quality of the generated text-based conversation.
pip install -r requirements.txt
cd ./Evaluate/
python evaluate.py --dataset='xxx'
,xxx
iscoat
orml-1m
.- All results are saved in
./res/
directory.
-
Convert text to audio using VITS, a SOTA end-to-end text-to-speech (TTS) model.
-
Improve code efficiency by conv_rec_sys.
-
Evaluate text-based conversation with Fed.