Skip to content

Latest commit

 

History

History

qdrant_101_audio_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Qdrant & Audio Data

main

Welcome to this tutorial on vector databases and music recommendation systems using Python and Qdrant. Here, we will learn about how to get started with audio data, embeddings and vector databases.

By the end of this tutorial, you will have a good understanding of how to use vector databases and Python to create your own music recommendation engine.

1. Overview

The dataset we will be using is called, Ludwig Music Dataset (Moods and Subgenres) and it can be found on Kaggle. It was collected for the purpose of music information retrieval (MIR) by Discogs and AcousticBrainZ, and it contains over 10,000 songs of different genres and subgenres. Bare in mind that the full dataset is 12GB in size so we recommend that you download your favorite genre from the mp3 directory, and the labels.json file. That will be more than enough to follow along for the rest of the tutorial.

Once you download the full dataset, you should see the following directories and files.

../data/ludwig_music_data
├── labels.json
├── mfccs
│   ├── blues
│   ├── ...
│   └── rock
├── mp3
│   ├── blues
│   ├── ...
│   └── rock
├── spectogram
│   └── spectogram
└── subgeneres.json

The labels.json contain all the metadata (e.g. artist, subgenre, album, etc.) associated with each song.

The Spectograms directory contains spectograms, which are visual representation of the frequencies present in an audio signal over time. It is a 2D graph where the x-axis represents time and the y-axis represents frequency. The intensity of the color or brightness of the graph indicates the strength or amplitude of the frequencies at a particular time. Here is an example of a Spectogram.

If you've ever wonder what audio data looks like visually, this is one way to visualize it.

Let's get our environment set up before we prepare the data.

2. Set Up

Before you run any line of code, please make sure you have

  1. downloaded the data
  2. created a virtual environment (if not in Google Colab)
  3. installed the packages below
  4. started a container with Qdrant
# with conda or mamba if you have it installed
mamba env create -n my_env python=3.10
mamba activate my_env

# or with virtualenv
python -m venv venv
source venv/bin/activate

# install packages
pip install qdrant-client transformers datasets pandas numpy torch librosa tensorflow openl3 panns-inference pedalboard streamlit

The open source version of Qdrant is available as a docker image and it can be pulled and run from any machine with docker installed. If you don't have Docker installed in your PC you can follow the instructions available in the official documentation here. After that, open your terminal and start by downloading the image with the following command.

docker pull qdrant/qdrant

Next, initialize Qdrant with the following command and you should be good to go.

docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

Verify that you are ready to go by importing the following libraries and connecting to Qdrant via its Python client.

from transformers import AutoFeatureExtractor, AutoModel
from IPython.display import Audio as player
from datasets import load_dataset, Audio
from panns_inference import AudioTagging
from qdrant_client import QdrantClient
from qdrant_client.http import models
from os.path import join
from glob import glob
import pandas as pd
import numpy as np
import librosa
import openl3
import torch
client = QdrantClient(host="localhost", port=6333)

We will also go ahead and create the collection we will be working with in this tutorial. The dimensions will be of size 2048 and we'll set the distance metric to cosine similarity.

my_collection = "music_collection"
client.recreate_collection(
    collection_name=my_collection,
    vectors_config=models.VectorParams(size=2048, distance=models.Distance.COSINE)
)
True

3. Data Prep

We will be using Huggin Face's datasets library to read in our data and massage it a bit.

data_path = join("..", "data", "ludwig_music_data")
data_path
'../data/ludwig_music_data'

Feel free to change the genre to the one you like the best.

music_data = load_dataset(
    "audiofolder", data_dir=join(data_path, "mp3", "latin"), split="train", drop_labels=True
)
music_data
Dataset({
    features: ['audio'],
    num_rows: 979
})
music_data[115]
{'audio': {'path': '/home/ramonperez/Tresors/qdrant_org/content/examples/data/ludwig_music_data/mp3/latin/0rXvhxGisD2djBmNkrv5Gt.mp3',
  'array': array([ 0.00000000e+00,  1.24776700e-09, -4.54397187e-10, ...,
         -7.98814446e-02, -8.84955898e-02, -1.05223551e-01]),
  'sampling_rate': 44100}}

As you can see, we got back json objects with an array representing our songs, the path to where each one of them is located in our PC, and the sampling rate for each. Let's play the song at index 115 and see what it sounds like.

player(music_data[115]['audio']['array'], rate=44100)
<iframe style="border-radius:12px" src="https://open.spotify.com/embed/track/0rXvhxGisD2djBmNkrv5Gt?utm_source=generator" width="100%" height="152" frameBorder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy"></iframe>

We'll need to extract the name of each mp3 file as this is the unique identifier we'll use in order to get the corresponding metadata for each song. While we are at it, we will also create a range of numbers and add it as the index to the dataset.

ids = [
    (
     music_data[i] # for every sample
     ['audio'] # in this directory
     ['path'] # extract the path
     .split("/") # split it by /
     [-1] # take only the last piece "id.mp3"
     .replace(".mp3", '') # and replace the .mp3 with nothing
    ) 
    for i in range(len(music_data))
]
index = [num for num in range(len(music_data))]
ids[:4]
['0010BnyFuw94XFautS2uJp',
 '00RhgYVH6DrHl0SuZWDp8W',
 '01k69xxIQGL94F8IfIkI5l',
 '02GUIyXZ9RNusgUocEQIzN']
music_data = music_data.add_column("index", index)
music_data = music_data.add_column("ids", ids)
music_data[-1]
{'audio': {'path': '/home/ramonperez/Tresors/qdrant_org/content/examples/data/ludwig_music_data/mp3/latin/7yX4WgUfoPpMKZHgqpaZ0x.mp3',
  'array': array([ 0.00000000e+00, -1.40022882e-09, -4.44221415e-09, ...,
         -9.52053051e-02, -8.90597273e-02, -8.10846481e-02]),
  'sampling_rate': 44100},
 'index': 978,
 'ids': '7yX4WgUfoPpMKZHgqpaZ0x'}

The metadata we will use for our payload lives in the labes.json file, so let's extract it.

label_path = join(data_path, "labels.json")
labels = pd.read_json(label_path)
labels.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
tracks
000QWvZpHrBIVrW4dGbaVI {'otherSubgenres': {'L': [{'S': 'electronic---...
0010BnyFuw94XFautS2uJp {'otherSubgenres': {'L': [{'S': ' world'}, {'S...
0055LRFB7zfdCXDGodyIz3 {'otherSubgenres': {'L': []}, 'artist': {'S': ...
005Dlt8Xaz3DkaXiRJgdiS {'otherSubgenres': {'L': [{'S': 'rock'}, {'S':...
006RpKEKItNO4q8TkAUpOv {'otherSubgenres': {'L': [{'S': 'classical---c...

As you can see, the dictionaries above contain a lot of useful information. Let's create a function to extract the data we want retrieve for our out recommendation system.

def get_metadata(x):
    cols = ['artist', 'genre', 'name', 'subgenres']
    list_of_cols = []
    for col in cols:
        try:
            mdata = list(x[col].values())[0]
        except:
            mdata = "Unknown"
        list_of_cols.append(mdata)

    return pd.Series(list_of_cols, index=cols)
clean_labels = labels['tracks'].apply(get_metadata).reset_index()
clean_labels.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
index artist genre name subgenres
0 000QWvZpHrBIVrW4dGbaVI 047 electronic General Error [{'S': 'electronic---synth-pop'}]
1 0010BnyFuw94XFautS2uJp Jimmy Buffett latin La Vie Dansante [{'S': 'latin---cubano'}]
2 0055LRFB7zfdCXDGodyIz3 New Order rock Doubts Even Here [{'S': 'rock---new wave'}]
3 005Dlt8Xaz3DkaXiRJgdiS Ricardo Arjona rock Historia de Taxi [{'S': 'rock---pop rock'}]
4 006RpKEKItNO4q8TkAUpOv Worrytrain electronic They Will Make My Passage Easy [{'S': 'electronic---ambient'}]

The last piece of the puzzle is to clean the subgenres a bit, and to extract the path to each of the files since we will need them to load the recommendations in our app later on.

def get_vals(genres):
    genre_list = []
    for dicts in genres:
        if type(dicts) != str:
            for _, val in dicts.items():
                genre_list.append(val)
    return genre_list

clean_labels['subgenres'] = clean_labels.subgenres.apply(get_vals)
clean_labels['subgenres'].head()
0    [electronic---synth-pop]
1            [latin---cubano]
2           [rock---new wave]
3           [rock---pop rock]
4      [electronic---ambient]
Name: subgenres, dtype: object
file_path = join(data_path, "mp3", "latin", "*.mp3")
files = glob(file_path)
ids = [i.split('/')[-1].replace(".mp3", '') for i in files]
music_paths = pd.DataFrame(zip(ids, files), columns=["ids", 'urls'])
music_paths.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ids urls
0 2PaETSKl3w3IdtLIbDnQXJ ../data/ludwig_music_data/mp3/latin/2PaETSKl3w...
1 3Cu37dl54yhg2ZPrEnTx0O ../data/ludwig_music_data/mp3/latin/3Cu37dl54y...
2 4RTRzqkcvvkvuMK5IpFLmS ../data/ludwig_music_data/mp3/latin/4RTRzqkcvv...
3 5A32KQZznC2HSqr9qzTl2N ../data/ludwig_music_data/mp3/latin/5A32KQZznC...
4 2uPQvR5WBOI22Wj2gwwiT5 ../data/ludwig_music_data/mp3/latin/2uPQvR5WBO...

We'll combine all files with metadata into one dataframe and then format it as a list of JSON objects for our payload.

metadata = (music_data.select_columns(['index', 'ids'])
                     .to_pandas()
                     .merge(right=clean_labels, how="left", left_on='ids', right_on='index')
                     .merge(right=music_paths, how="left", left_on='ids', right_on='ids')
                     .drop("index_y", axis=1)
                     .rename({"index_x": "index"}, axis=1)
        )
metadata.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
index ids artist genre name subgenres urls
0 0 0010BnyFuw94XFautS2uJp Jimmy Buffett latin La Vie Dansante [latin---cubano] ../data/ludwig_music_data/mp3/latin/0010BnyFuw...
1 1 00RhgYVH6DrHl0SuZWDp8W Jimmy Buffett latin Brown Eyed Girl [latin---cubano] ../data/ludwig_music_data/mp3/latin/00RhgYVH6D...
2 2 01k69xxIQGL94F8IfIkI5l Los Delinqüentes latin Fumata Del Ladrillo [latin---flamenco, rock---punk] ../data/ludwig_music_data/mp3/latin/01k69xxIQG...
3 3 02GUIyXZ9RNusgUocEQIzN La Bottine Souriante latin Ma Paillasse [latin---salsa] ../data/ludwig_music_data/mp3/latin/02GUIyXZ9R...
4 4 02IFfsWwxek6h9qLEH4sRA Gipsy Kings latin Estrellas [latin---flamenco] ../data/ludwig_music_data/mp3/latin/02IFfsWwxe...
payload = metadata.drop(['index', 'ids'], axis=1).to_dict(orient="records")
payload[:3]
[{'artist': 'Jimmy Buffett',
  'genre': 'latin',
  'name': 'La Vie Dansante',
  'subgenres': ['latin---cubano'],
  'urls': '../data/ludwig_music_data/mp3/latin/0010BnyFuw94XFautS2uJp.mp3'},
 {'artist': 'Jimmy Buffett',
  'genre': 'latin',
  'name': 'Brown Eyed Girl',
  'subgenres': ['latin---cubano'],
  'urls': '../data/ludwig_music_data/mp3/latin/00RhgYVH6DrHl0SuZWDp8W.mp3'},
 {'artist': 'Los Delinqüentes',
  'genre': 'latin',
  'name': 'Fumata Del Ladrillo',
  'subgenres': ['latin---flamenco', 'rock---punk'],
  'urls': '../data/ludwig_music_data/mp3/latin/01k69xxIQGL94F8IfIkI5l.mp3'}]

4. Embeddings

Audio embeddings are low dimensional vector representations of audio signals and they capture important features such as the pitch, timbre, and spatial characteristics of sound. These embeddings can be used as compact and meaningful representations of audio signals for various downstream audio processing tasks such as speech recognition, speaker recognition, music genre classification, and event detection. These embeddings are generally obtained using deep neural networks that take in an audio signal as input, and output a learned low-dimensional feature representation for that audio. In addition, these embeddings can also be used as input to further machine learning models.

There are different ways in which we can get started creating embeddings for our songs:

  1. by training a deep neural network from scratch on our dataset and extracting the embedding layer,
  2. by using a pre-trained model and the transformers Python library, or
  3. by using purpose-built libraries like openl3 and pann_inference.

There are other ways, of course, but here we'll use 2 and 3, the transformers architecture, and openl3 and pann_inference libraries.

Important INFO: While there are three approached showcased here, you only need to pick one to continue with the tutorial. Here, we will follow along using the output from panns_inference.

Let's get started.

openl3

OpenL3 is an open-source Python library for computing deep audio and image embeddings. It was created to provide an easy-to-use framework for extracting embeddings from audio and image data using pre-trained deep neural network models. The library includes pre-trained audio models like VGGish, YAMNet, and SoundNet, as well as pre-trained image models like ResNet and Inception. These models can be used for a variety of audio and image processing tasks, such as speech recognition, music genre classification, and object detection. Overall, OpenL3 is designed to make it easier for researchers and developers to incorporate deep learning models into their audio and image processing workflows.

Let's read in an audio file and extract the embedding layer with openl3.

one_song = join(data_path, "mp3", "latin", "0rXvhxGisD2djBmNkrv5Gt.mp3")
audio, sr = librosa.core.load(one_song, sr=44100, mono=True)
audio.shape
(1322496,)
player(audio, rate=sr)
open_emb, ts = openl3.get_audio_embedding(audio, sr, input_repr="mel128", frontend='librosa')

The model returns an embedding vector for each timestamp and a timestamp vector. This means that to get a one dimensional embedding for the whole song, we'll need to get the mean of this vectors.

open_emb.shape, open_emb.mean(axis=0).shape, open_emb.mean(axis=0)[:20]

You can generate your embedding layer for the whole dataset with the following function. Note that loading the model first, in particular Kapre, will work on a GPU without any further configuration.

model_kapre = openl3.models.load_audio_embedding_model(
    input_repr='mel128', content_type='music', embedding_size=512
)

def get_open_embs(batch):
    audio_arrays = [song['array'] for song in batch['audio']]
    sr_arrays = [song['sampling_rate'] for song in batch['audio']]
    embs_list, _ = openl3.get_audio_embedding(audio_arrays, sr_arrays, model=model_kapre)
    batch["open_embeddings"] = np.array([embedding.mean(axis=0) for embedding in embs_list])
    return batch
music_data = music_data.map(get_open_embs, batched=True, batch_size=20)
music_data

The nice thing about openl3 is that it comes with the best model for our task. The downside is that it is the slowest of the three methods showcased here.

Panns Inference

The panns_inference library is a Python package built on top of PyTorch and torchaudio that provides an interface for audio tagging and sound event detection tasks. It implements CNN-based models trained on large-scale audio datasets such as AudioSet and UrbanSound8K. The package was created to make it easy for researchers and practitioners to use these pre-trained models for inference on their own audio datasets, without needing to train their own models from scratch. The panns_inference library provides a high-level, user-friendly API for loading pre-trained models, generating embeddings, and performing audio classification tasks in just a few lines of code.

The panns_inference package requires that the data is either as a numpy array or as a torch tensor, both of shape [batch, vector] so let's reshape our song.

audio2 = audio[None, :]
audio2.shape
(1, 1322496)

Bare in mind that this next step, downloading the model, can take quite a bit of time depending on your internet speed. Afterwards, inference is quite fast and the model will return to us two vectors, the timestamps and the embeddings.

at = AudioTagging(checkpoint_path=None, device='cuda')
Checkpoint path: /home/ramonperez/panns_data/Cnn14_mAP=0.431.pth
GPU number: 1
clipwise_output, embedding = at.inference(audio2)
clipwise_output.shape, embedding.shape
((1, 527), (1, 2048))
embedding[0, 470:500]
array([0.       , 0.       , 0.       , 0.       , 0.       , 0.       ,
       3.1233616, 0.       , 0.       , 0.       , 0.       , 0.       ,
       0.       , 0.       , 0.       , 0.       , 0.       , 0.       ,
       0.       , 1.6375436, 0.       , 0.       , 0.       , 0.       ,
       0.       , 0.       , 0.       , 0.       , 0.       , 0.       ],
      dtype=float32)

To get an embedding layer for all of the songs using the panns_inference package, you can use the following function. This is the output we will be using for the remainder of the tutorial.

def get_panns_embs(batch):
    arrays = [torch.tensor(val['array'], dtype=torch.float64) for val in batch['audio']]
    inputs = torch.nn.utils.rnn.pad_sequence(arrays, batch_first=True, padding_value=0).type(torch.cuda.FloatTensor)
    _, embedding = at.inference(inputs)
    batch['panns_embeddings'] = embedding
    return batch
music_data = music_data.map(get_panns_embs, batched=True, batch_size=8)
music_data
Dataset({
    features: ['audio', 'index', 'ids', 'panns_embeddings'],
    num_rows: 979
})

Transformers

Transformers are a type of neural network used for natural language processing, but the architecture can also be used for processing audio data by breaking the sound waves into smaller parts and learning how those parts fit together to form meaning.

We can load a pre-trained model from the Hugging Face hub and extract the embeddings from it. Note that this step will give us the worst result of the three since Wav2Vec was trained to recognize speech rather than to classify music genres. Hence, it is important to note that fine-tunning the data with Wav2Vec might not improve a whole lot the quality of the embeddings.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained('facebook/wav2vec2-base').to(device)
feature_extractor = AutoFeatureExtractor.from_pretrained('facebook/wav2vec2-base')

A key step before extracting the features from each song and passing them through the model is to resample the songs 16kHz.

resampled_audio = librosa.resample(y=audio2, orig_sr=sr, target_sr=16_000)
display(player(resampled_audio, rate=16_000))
resampled_audio.shape
inputs = feature_extractor(
    resampled_audio[0], sampling_rate=feature_extractor.sampling_rate, return_tensors="pt",
    padding=True, return_attention_mask=True, truncation=True, max_length=16_000
).to(device)

inputs['input_values'].shape
torch.Size([1, 16000])
with torch.no_grad():
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)
embeddings.shape
torch.Size([1, 768])

To generate the embedding layer for the whole dataset, we can use the following function.

def get_trans_embs(batch):
    audio_arrays = [x["array"] for x in batch["audio"]]

    inputs = feature_extractor(
        audio_arrays, sampling_rate=16_000, return_tensors="pt", padding=True, 
        return_attention_mask=True, max_length=16_000, truncation=True
    ).to(device)

    with torch.no_grad():
        pooled_embeds = model(**inputs).last_hidden_state.mean(dim=1)
    
    return {"transform_embeddings": pooled_embeds.cpu().numpy()}
music_data = music_data.cast_column("audio", Audio(sampling_rate=16_000))
music_data = music_data.map(embed_audio, batched=True, batch_size=20)
music_data

5. Building a Recommendation System

Recommendation systems are algorithms and techniques used to suggest items or content to users based on their preferences, historical data, or behavior. These systems aim to provide personalized recommendations to users, helping them discover new items of interest and enhancing their overall user experience. Recommendation systems are widely used in various domains such as e-commerce, streaming platforms, social media, and more.

Let's start by populating the collection we created earlier. If you picked the transformers approach or openl3 to follow along, you will need to recreate your collection with the appropriate dimension size.

client.upsert(
    collection_name=my_collection,
    points=models.Batch(
        ids=music_data['index'],
        vectors=music_data['panns_embeddings'],
        payloads=payload
    )
)
UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

We can retrieve any song by its id using client.retrieve() and then extract the information in the payload with the .payload attribute.

result = client.retrieve(
    collection_name=my_collection,
    ids=[100],
    with_vectors=True # we can turn this on and off depending on our needs
)
result[0].payload
{'artist': 'La Bottine Souriante',
 'genre': 'latin',
 'name': 'Chant de la luette',
 'subgenres': ['latin---salsa'],
 'urls': '../data/ludwig_music_data/mp3/latin/0lyeChzw7IWf9ytZ7S0jDK.mp3'}
r = librosa.core.load(result[0].payload['urls'], sr=44100, mono=True)
player(r[0], rate=r[1])
<iframe style="border-radius:12px" src="https://open.spotify.com/embed/track/0lyeChzw7IWf9ytZ7S0jDK?utm_source=generator" width="100%" height="152" frameBorder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy"></iframe>

You can search for similar songs with the client.search() method. Let's find and artist and a song we like and use that id to grab the embedding and search for similar songs.

PS. Here is Celia Cruz. 😎

metadata.query("artist == 'Celia Cruz'")
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
index ids artist genre name subgenres urls
122 122 0v1oaOqkXpubdykx58BQwY Celia Cruz latin Juancito Trucupey [latin---salsa] ../data/ludwig_music_data/mp3/latin/0v1oaOqkXp...
150 150 19zWrDlXew0Fzouu7a4qhx Celia Cruz latin Cuando Sali De Cuba [latin---salsa] ../data/ludwig_music_data/mp3/latin/19zWrDlXew...
178 178 1MYds6o9aN2Wxa4TDxcJPB Celia Cruz latin Mi vida es cantar [latin---salsa] ../data/ludwig_music_data/mp3/latin/1MYds6o9aN...
459 459 3WphzI2fb2NTUsfja51U7P Celia Cruz latin Dile que por mi no tema [latin---salsa] ../data/ludwig_music_data/mp3/latin/3WphzI2fb2...
client.search(
    collection_name=my_collection,
    query_vector=music_data[150]['panns_embeddings'],
    limit=10
)
[ScoredPoint(id=150, version=0, score=0.99999994, payload={'artist': 'Celia Cruz', 'genre': 'latin', 'name': 'Cuando Sali De Cuba', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/19zWrDlXew0Fzouu7a4qhx.mp3'}, vector=None),
 ScoredPoint(id=730, version=0, score=0.9206133, payload={'artist': 'Cartola', 'genre': 'latin', 'name': 'Fita meus olhos', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/5iyRJ796USPTXEO4JXO0gC.mp3'}, vector=None),
 ScoredPoint(id=251, version=0, score=0.9087784, payload={'artist': "Oscar D'León", 'genre': 'latin', 'name': 'Volver a Verte', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1kD5EOoZ45kjq50NLfhRGc.mp3'}, vector=None),
 ScoredPoint(id=739, version=0, score=0.90295744, payload={'artist': 'Cartola', 'genre': 'latin', 'name': 'Verde que te quero rosa', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/5plwAx4oAWnuhSwivS5Yeg.mp3'}, vector=None),
 ScoredPoint(id=268, version=0, score=0.8995003, payload={'artist': 'Chicha Libre', 'genre': 'latin', 'name': 'La cumbia del zapatero', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1ufmU58QldvKrHuATBb3kU.mp3'}, vector=None),
 ScoredPoint(id=766, version=0, score=0.88916755, payload={'artist': 'Ska Cubano', 'genre': 'latin', 'name': 'Tequila', 'subgenres': ['latin---cubano', 'reggae'], 'urls': '../data/ludwig_music_data/mp3/latin/618iBzv4oH2wb0WElQV9ru.mp3'}, vector=None),
 ScoredPoint(id=7, version=0, score=0.8882055, payload={'artist': 'Ibrahim Ferrer', 'genre': 'latin', 'name': 'Nuestra Ruca', 'subgenres': ['latin---cubano'], 'urls': '../data/ludwig_music_data/mp3/latin/02vPUwCweGxigItnNf2Jfr.mp3'}, vector=None),
 ScoredPoint(id=467, version=0, score=0.88348734, payload={'artist': 'La-33', 'genre': 'latin', 'name': 'Soledad', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/3bpqoOSDwdaK003DPMvDJQ.mp3'}, vector=None),
 ScoredPoint(id=388, version=0, score=0.882995, payload={'artist': 'David Byrne', 'genre': 'latin', 'name': 'Loco De Amor', 'subgenres': ['latin---salsa', 'latin---samba', 'rock---pop rock'], 'urls': '../data/ludwig_music_data/mp3/latin/2uJsn2yi8HVZ8qwICHcNSW.mp3'}, vector=None),
 ScoredPoint(id=139, version=0, score=0.8820398, payload={'artist': 'Ibrahim Ferrer', 'genre': 'latin', 'name': 'Qué bueno baila usted', 'subgenres': ['latin---cubano'], 'urls': '../data/ludwig_music_data/mp3/latin/16FEEqvnZKcgfA5esxe5kL.mp3'}, vector=None)]

You can evaluate the search results by looking at the score or by listening to the songs and judging how similar they really are. I, the author, can vouch for the quality of the ones we got for Celia Cruz. 😎

The recommendation API works a bit differently, we don't need a vector query but rather the ids of positive (required) vectors and negative (optional) ones, and Qdrant will do the heavy lifting for us.

client.recommend(
    collection_name=my_collection,
    positive=[178, 122],
    limit=5
)
[ScoredPoint(id=384, version=0, score=0.96683824, payload={'artist': 'Gilberto Santa Rosa', 'genre': 'latin', 'name': 'Perdoname', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/2qqrgPaRZow7lrLttDL6Im.mp3'}, vector=None),
 ScoredPoint(id=424, version=0, score=0.9633477, payload={'artist': 'Gilberto Santa Rosa', 'genre': 'latin', 'name': 'Amanecer Borincano', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/39FQfusOwKnPCjOgQHcx6S.mp3'}, vector=None),
 ScoredPoint(id=190, version=0, score=0.9624174, payload={'artist': 'Luigi Texidor', 'genre': 'latin', 'name': 'Mi Testamento', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1RIdI5c7RjjagAcMA5ixpv.mp3'}, vector=None),
 ScoredPoint(id=92, version=0, score=0.95979774, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Mambo Gozón', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/0hk1gSyn3wKgdxqF6qaKUZ.mp3'}, vector=None),
 ScoredPoint(id=886, version=0, score=0.95851713, payload={'artist': 'Tony Vega', 'genre': 'latin', 'name': 'Ella es', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/718X6sjlHdmOzdTfJv4tUc.mp3'}, vector=None)]

Say we don't like Chayanne because his songs are too mushy. We can use the id of one of his mushiest songs so that Qdrant gets us results as far away as possible from such a song.

metadata.query("artist == 'Chayanne'")
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
index ids artist genre name subgenres urls
162 162 1EyREvPFfh2TgXFMCPoydD Chayanne latin Caprichosa [latin---salsa, pop---ballad] ../data/ludwig_music_data/mp3/latin/1EyREvPFfh...
208 208 1XMw83NJw29iwarOqVibos Chayanne latin Querida [latin---samba, pop---ballad] ../data/ludwig_music_data/mp3/latin/1XMw83NJw2...
385 385 2sKo5u6IppUEudIz265wYa Chayanne latin Yo Te Amo [latin---salsa, pop---ballad] ../data/ludwig_music_data/mp3/latin/2sKo5u6Ipp...
412 412 34hM4PLlhyBysgL50IWdHf Chayanne latin Y tú te vas [latin---salsa, pop---ballad] ../data/ludwig_music_data/mp3/latin/34hM4PLlhy...
645 645 4zkOTmiamebLJ39Sqbp7sb Chayanne latin Boom Boom [latin---salsa, pop---ballad] ../data/ludwig_music_data/mp3/latin/4zkOTmiame...
client.recommend(
    collection_name=my_collection,
    positive=[178, 122],
    negative=[385],
    limit=5
)
[ScoredPoint(id=546, version=0, score=0.87100524, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'El Preguntón', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4EH5vM8p1Ibvlz5cgZLHvY.mp3'}, vector=None),
 ScoredPoint(id=85, version=0, score=0.86223793, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Malembe', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/0efiEWiAFtHrQHTWfeDikg.mp3'}, vector=None),
 ScoredPoint(id=910, version=0, score=0.8605486, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Cubanismo Llegó', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/7FSSdHxCoyEMfHUP6NdOb2.mp3'}, vector=None),
 ScoredPoint(id=540, version=0, score=0.85953826, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
 ScoredPoint(id=812, version=0, score=0.85860175, payload={'artist': 'Tommy Olivencia', 'genre': 'latin', 'name': 'Trucutú', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/6I9OiSVppRGjuAweyBucE2.mp3'}, vector=None)]

Say we want to get recommendations based on a song we just recently listened to and liked, and that the system remembers all of our preferences.

marc_anthony_valio_la_pena = music_data[301]
client.recommend(
    collection_name=my_collection,
    positive=[marc_anthony_valio_la_pena['idx'], 178, 122, 459],
    negative=[385],
    limit=5
)
[ScoredPoint(id=546, version=0, score=0.86705625, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'El Preguntón', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4EH5vM8p1Ibvlz5cgZLHvY.mp3'}, vector=None),
 ScoredPoint(id=85, version=0, score=0.8635909, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Malembe', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/0efiEWiAFtHrQHTWfeDikg.mp3'}, vector=None),
 ScoredPoint(id=540, version=0, score=0.8588973, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
 ScoredPoint(id=812, version=0, score=0.85626286, payload={'artist': 'Tommy Olivencia', 'genre': 'latin', 'name': 'Trucutú', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/6I9OiSVppRGjuAweyBucE2.mp3'}, vector=None),
 ScoredPoint(id=587, version=0, score=0.85231805, payload={'artist': 'Tito Puente & His Orchestra', 'genre': 'latin', 'name': 'Mambo Gozon', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4Sewxyw6EtUldCIz2sD9S5.mp3'}, vector=None)]

Lastly, imagine we want a Samba filter for the recommendations we get, the UI could have tags for us to choose from and Qdrant would do the rest.

samba_songs = models.Filter(
    must=[models.FieldCondition(key="subgenres", match=models.MatchAny(any=['latin---samba']))]
)
results = client.recommend(
    collection_name=my_collection,
    query_filter=samba_songs,
    positive=[marc_anthony_valio_la_pena['idx'], 178, 122, 459],
    negative=[385],
    limit=5
)
results
[ScoredPoint(id=540, version=0, score=0.8588973, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
 ScoredPoint(id=493, version=0, score=0.8236424, payload={'artist': 'Tito Nieves', 'genre': 'latin', 'name': 'De mi enamórate', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/3nnQUYKWBmHlfm5XpdWqNr.mp3'}, vector=None),
 ScoredPoint(id=92, version=0, score=0.8120091, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Mambo Gozón', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/0hk1gSyn3wKgdxqF6qaKUZ.mp3'}, vector=None),
 ScoredPoint(id=856, version=0, score=0.80171, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Son de la Loma', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/6c8qeNyZrTB8E3RKdPdNBh.mp3'}, vector=None),
 ScoredPoint(id=892, version=0, score=0.7895387, payload={'artist': 'David Byrne', 'genre': 'latin', 'name': 'Make Believe Mambo', 'subgenres': ['latin---salsa', 'latin---samba', 'rock---pop rock'], 'urls': '../data/ludwig_music_data/mp3/latin/74V0PhSWlBtHvBQAMYMgsX.mp3'}, vector=None)]
for result in results:
    song, sr = librosa.core.load(result.payload['urls'], sr=44100, mono=True)
    display(player(song, rate=sr))

That's it! So, what's next? You should try using different genres (or all of them), creating embeddings for these and then building your own recommendation engine on top of Qdrant. Better yet, you could find your own dataset and build a personalized search engine for the things you like, just make sure you let us know via our discord channel here. 😎

6. Putting it All Together

Now that we have covered everything we need, it is time to put it to the test with a UI, and for this, we'll use streamlit.

%%writefile recsys_app.py

from panns_inference import AudioTagging
from qdrant_client import QdrantClient
from pedalboard.io import AudioFile
import streamlit as st
import torch

st.title("Music Recommendation App")
st.markdown("Upload your favorite songs and get a list of recommendations from our database of music.")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
at = AudioTagging(checkpoint_path=None, device=device)
client = QdrantClient("localhost", port=6333)

music_file = st.file_uploader(label="📀 Music file 🎸",)

if music_file:
    st.audio(music_file)

    with AudioFile(music_file) as f:
        a_song = f.read(f.frames)[0][None, :]

    clip, emb = at.inference(a_song)

    st.markdown("## Semantic Search")
    results = client.search(collection_name="music_collection", query_vector=emb[0], limit=4)
    
    for result in results:
        st.header(f"Song: {result.payload['name']}")
        st.subheader(f"Artist: {result.payload['artist']}")
        st.audio(result.payload["urls"])
!streamlit run recsys_app.py