Student's name | SCIPER |
---|---|
Aleksei Kashuba | 298846 |
Ivan Yurov | 292453 |
Ekaterina Svikhnushina | 292820 |
Milestone 1 • Milestone 2 • Milestone 3
10% of the final grade
We retrieved our main dataset using the Spotify API. The dataset contains information about artists’ popularity in the cities of the world. In particular, it shows the top 50 cities in which people listen to the given artist as well as the number of listeners. We would like to visualize this data on a map. To that end, for each city in the main dataset, we retrieved geographical coordinates from the World Cities Database. Additionally, we collected the population information of each city from the World Cities Population dataset in order to be able to normalize the listener counts. We don’t expect any errors in the main dataset collected from Spotify. The data about the location and the population count may contain some inaccuracies but the quality should be sufficient for the purposes of our visualizations. The entity-relationship diagram of our database is illustrated below
Most of the data-cleaning was handled during the database construction phase. Currently we have a large number (see Exploratory Data Analysis) of various music genres in the dataset. We plan to reduce them to a reasonable amount employing clusterization or some other approaches. Additionally, we will need to come up with a metric to measure similarity between the cities based on prevailing music genres and popular artists. This should be a relatively easy task to handle.
Modern means of communication and transportation have made traveling much easier. More and more people relocate abroad for work, relationships, or other reasons, rising the total count of foreign-born population to the impressive 300 million worldwide. With many cities turning into melting pots, cultural, social and personal motives drive people’s curiosity to explore how diverse backgrounds and experiences of different corners of the world intertwine and blur geographical borders. As music plays a central role in cultural identity, our project strives to provide these insights by looking into musical preferences of people around the globe.
We would like to explore multiple ways in which this can be achieved. First, we plan to overlay the geographical map with the color-coded information about music genre popularity. Another part of the project concerns visualizing genre profiles for each city. To develop this idea further, we can create a lower-dimensional embedding space for city profiles to cluster cities with similar musical preferences. Finally, to bring the interactivity of our visualization to the next level, we consider querying Spotify about the user's favorite artists and showing the cities aligning with their musical tastes.
Simple statistics about the data:
artists | cities | genres |
---|---|---|
32923 | 3277 | 3823 |
Cities by Spotify usage:
SELECT city,
Sum(listeners) AS total_listeners
FROM artist_cities
JOIN cities
ON cities.id = artist_cities.city_id
GROUP BY city
ORDER BY total_listeners DESC
LIMIT 10;
city | total_listeners |
---|---|
Mexico City | 418013817 |
Santiago | 378280299 |
São Paulo | 340642646 |
Chicago | 278866062 |
Los Angeles | 270153240 |
Dallas | 228789729 |
Sydney | 228039225 |
Houston | 186436834 |
Paris | 183979818 |
Buenos Aires | 175351209 |
Cities by Spotify usage normalized by population:
SELECT city,
Sum(listeners)/population AS normalized_listeners, population
FROM artist_cities
JOIN cities
ON cities.id=artist_cities.city_id
GROUP BY city,
population
ORDER BY normalized_listeners DESC limit 10;
city | normalized_listeners | population |
---|---|---|
Frederiksberg | 645 | 3142 |
San Juan | 415 | 15416 |
Elkridge | 386 | 19367 |
Quezon | 297 | 18451 |
Gehrden | 217 | 15139 |
Bulacan | 178 | 4518 |
Amsterdam | 166 | 1031000 |
Adlaon | 140 | 3647 |
San Miguel | 133 | 65661 |
Oslo | 130 | 835000 |
Note: we can see that some errors inevitably crept into out population data. (Frederiksberg's real population count is closer to 100'000) However, we don't expect this to be a major problem.
Cities that most often appear in top 50 for artists:
SELECT city,
Count(*) AS in_top_50
FROM artist_cities
JOIN cities
ON cities.id = artist_cities.city_id
GROUP BY city
ORDER BY in_top_50 DESC
LIMIT 10;
city | in_top_50 |
---|---|
Los Angeles | 17521 |
Chicago | 17180 |
Sydney | 16717 |
Toronto | 16537 |
Dallas | 15607 |
London | 15583 |
Melbourne | 15518 |
Houston | 15469 |
San Francisco | 14768 |
Brisbane | 14748 |
Artists with most listeners in their top 50 cities (proxy for most popular artists):
SELECT name,
Sum(listeners) AS total_listeners
FROM artist_cities
JOIN artists
ON artist_cities.artist_id = artists.id
GROUP BY name
ORDER BY total_listeners DESC
LIMIT 10;
name | total_listeners |
---|---|
J Balvin | 24792577 |
Justin Bieber | 23257678 |
Bad Bunny | 22819988 |
The Weeknd | 21361869 |
Daddy Yankee | 20311296 |
Ed Sheeran | 20210593 |
Billie Eilish | 19962808 |
Drake | 19346899 |
Dua Lipa | 19264039 |
Post Malone | 18856668 |
Artists by monthly listeners:
SELECT name,
popularity,
monthly_listeners
FROM artists
ORDER BY popularity DESC
LIMIT 10;
name | popularity | monthly_listeners |
---|---|---|
Bad Bunny | 100 | 44940166 |
Lil Uzi Vert | 96 | 25691736 |
Justin Bieber | 96 | 63010229 |
J Balvin | 96 | 54760349 |
Drake | 96 | 48631727 |
Post Malone | 95 | 52179881 |
Billie Eilish | 95 | 56725215 |
The Weeknd | 94 | 56971820 |
BTS | 94 | 20325658 |
Juice WRLD | 94 | 28604023 |
Note: we can see that the proxy that we used previously seems to provide a meaningful signal about artist popularity. Next we will use a similar proxy to find out the most popular genres.
Proxy for most popular genres:
SELECT genres.NAME,
Sum(listeners) AS total_listeners
FROM artist_cities
JOIN artists
ON artist_cities.artist_id=artists.id
JOIN artist_genres
ON artist_genres.artist_id=artists.id
JOIN genres
ON genres.id=genre_id
GROUP BY genres.NAME
ORDER BY total_listeners DESC limit 10;
name | total_listeners |
---|---|
pop | 1833385358 |
dance pop | 1191221938 |
latin | 992892784 |
rap | 812596658 |
post-teen pop | 788187051 |
pop rap | 786859077 |
rock | 643883032 |
reggaeton | 585994405 |
hip hop | 561785934 |
latin pop | 516769383 |
Proxy for most popular genres in Lausanne:
SELECT genres.NAME,
Sum(listeners) AS total_listeners
FROM artist_cities
JOIN cities
ON cities.id=artist_cities.city_id
JOIN artists
ON artist_cities.artist_id=artists.id
JOIN artist_genres
ON artist_genres.artist_id=artists.id
JOIN genres
ON genres.id=genre_id
WHERE city='Lausanne'
GROUP BY genres.NAME
ORDER BY total_listeners DESC limit 10;
name | total_listeners |
---|---|
french hip hop | 1640878 |
pop urbaine | 1417919 |
rap francais | 890962 |
french pop | 640587 |
francoton | 609597 |
german hip hop | 529343 |
rap conscient | 449738 |
chanson | 441105 |
german pop | 311419 |
nouvelle chanson francaise | 290924 |
-
What others have already done with the data?
- We collected the dataset by ourselves, so it is unique in this sense. Since the data mostly come from Spotify, service developers might have used it for related projects. We didn't find any interactive visualizations authored by Spotify team and addressing the same ideas as indicated in Problematic section. The only somewhat related visualization from Spotify is listed below as a source of inspiration.
-
Why is your approach original?
- Most importantly, originality of our approach results from the way how we collected the dataset. Some previous works analysed people's musical prefernces based on location-based BitTorrent traffic, geo-tagged listening events mined from Twitter, and listening events of Last.fm, a music recommender service. Given that in 2019 Spotify ranked No. 1 in the list of most popular music streaming services in the world, we expect Spotify's data to be the most insightful and representative.
- Even though several research works investigated music profiles of cities, georaphical regions, and countries, to the best of our knowledge no working online visualization of this phenomenon exists. Moreover, none of the offline prototypes that we saw (e.g. Music Tweet Map) represents city profiles with such granularity as we aim to, neither provides easily interpretable by the users search and naviation panel.
-
What source of inspiration do you take?
- Inspiration for visualization:
- Hauger et. al, 2016. Music Tweet Map: A browsing interface to explore the microblogosphere of music.
- Schedl, 2017. Investigating country-specific music preferences and music recommendation algorithms with the LFM-1b dataset.
- Mellander et. al, 2018. The geography of music preferences.
- Music Tweet Map
- The one million tweet map
- What Type of Music Is Your City Most Passionate About?
- Hoodmaps
- Every Noise at Once
- Inspiration for visualization:
-
This is the first project for which we use our dataset.
10% of the final grade
Our report is available in the root folder of this repository.
Our project prototype can be accessed by the link.
80% of the final grade