Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Index out of bounds (IndexError) when importing playlist #4069

Open
Primemeow opened this issue Aug 24, 2023 · 25 comments
Open

[Bug] Index out of bounds (IndexError) when importing playlist #4069

Primemeow opened this issue Aug 24, 2023 · 25 comments
Labels
bug Something isn't working module:imports Account data import/export

Comments

@Primemeow
Copy link

Primemeow commented Aug 24, 2023

Describe the bug
I am unable to import any playlists, and this is the given error. I have tried on multiple instances and received the same error.

Steps to Reproduce

  1. Export playlist from Google Takeout
  2. Select the file in /data_control
  3. Click import

Logs
Title: Index out of bounds (IndexError)
Date: 2023-08-24T20:18:37Z
Route: /data_control?referer=%2Fsubscription_manager
Version: 2023.08.07-3450896 @ master

Backtrace

Index out of bounds (IndexError)
  from /usr/share/crystal/src/json/parser.cr:117:5 in 'update_data_control'
  from lib/kemal/src/kemal/route.cr:12:9 in '->'
  from src/invidious/helpers/handlers.cr:30:37 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/filter_handler.cr:21:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from src/ext/kemal_static_file_handler.cr:112:11 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'process'
  from /usr/share/crystal/src/http/server.cr:500:5 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???

Screenshots
image

Additional context

  • Browser (if applicable): Firefox 116.0.3
  • OS (if applicable): Windows 11
@Primemeow Primemeow added the bug Something isn't working label Aug 24, 2023
@mattiaudisio

This comment was marked as duplicate.

@unixfox
Copy link
Member

unixfox commented Aug 26, 2023

Check if this solves your issue: #4048 (comment)

@Primemeow
Copy link
Author

Check if this solves your issue: #4048 (comment)

That seems to be an unrelated issue, they were trying to import subscriptions using the playlist dialog and got the error. I’m trying to import playlists through the playlist dialog and I’m getting that error.

@cedriclocqueneux

This comment was marked as duplicate.

@andreschoppe

This comment was marked as duplicate.

@gil-roboute

This comment was marked as duplicate.

@RadioDarrenFM

This comment was marked as duplicate.

@gptlang

This comment was marked as duplicate.

@c13mn14k
Copy link

c13mn14k commented Oct 19, 2023

Same issue, maybe youtube has changed the export csv ? I don't have a way to verify it, but I looked at the code. In

def parse_playlist_export_csv(user : User, raw_input : String)
# Split the input into head and body content
raw_head, raw_body = raw_input.strip('\n').split("\n\n", limit: 2, remove_empty: true)
# Create the playlist from the head content
csv_head = CSV.new(raw_head.strip('\n'), headers: true)
csv_head.next
title = csv_head[4]
description = csv_head[5]
visibility = csv_head[6]

it seems that this function expects these data points to be in the CSV:

  • title,
  • description,
  • visibility data,

which are not inside the csv. The title is in the filename, but other fields are not present anywhere.
My playlist csv is named +-filmy.csv and the two rows are:

Identyfikator filmu,Sygnatura czasowa utworzenia filmu z playlisty
j-qzjKaIZnc,2019-10-11T00:13:53+00:00

Which signifies a discrepancy and possible export csv schema change.

Update: youtube definitely changed the schema. Now the csvs exported are as follows.

For n playlists, youtube gives n+1 files:

  • n csv files named {playlist-title}-videos.csv with schema video id,timestamp when added to playlist
  • 1 csv file named playlist(s).csv with schema id, ... some bs ... ,original playlist title,original playlist title language,timestamp created,timestamp updated,playlist videos order,visibility.

Do note that I am translating titles and schema from polish.

I'll try later today to write a PR for this bug, but I know of Crystal since today, so I may give up during env setup :/

Additionaly, since the schema changed a PR that changes import function is necessary anyway - I think it'll be very useful to implement importing every playlist at once as a directory. Since the data is split in two types of files anyway, users will need to upload at least two files even when importing one playlist and it would be useful to not have to upload these files for each playlist to import. There are two problems still:

Since the files with video ids do not contain playlist id it may be difficult to join the data between the two types of CSVs youtube gives - the only identifiers are playlist title, which is obviously unreliable and may also be escaped since playlist title is part of the filename and playlist creation timestamp which then can be used to search for the CSV file with a video where timestamp when added to playlist matches the timestamp of playlist creation, which should exist and work reliably in most cases I think.

The second problem is that users may not want to upload all playlist, but then they can simply delete the csv files of unwanted playlist.

@DOOMMARINE117

This comment was marked as spam.

@raxod502
Copy link

raxod502 commented Nov 24, 2023

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why. (the latter part was user error)

You give as arguments the path of the playlists subdirectory in Google Takeout, and the path to a new directory that will be completely deleted and replaced with a fixed version of the playlists. You probably should review the script before running it, as with any other unvetted code on the internet.

Script text
#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist Title (Original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith("-videos.csv"), fname
    playlist_name = fname.removesuffix("-videos.csv")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist Visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

@DOOMMARINE117
Copy link

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why.

Script text

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist Title (Original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith("-videos.csv"), fname
    playlist_name = fname.removesuffix("-videos.csv")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist Visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

How to add this in? On the webpage?

@raxod502
Copy link

The script is for transforming the Google Takeout CSV files before import.

@DOOMMARINE117
Copy link

The script is for transforming the Google Takeout CSV files before import.

I see, so I add this to a file of text document? Then place this in takeout folder?

@raxod502
Copy link

raxod502 commented Dec 2, 2023

You'll need some Python knowledge to be able to properly use and adapt the script to your use case. I would recommend seeking advice elsewhere, the issue tracker is primarily for technical discussion rather than general support.

@Ajimaru
Copy link

Ajimaru commented Dec 28, 2023

This code only works if the data was exported from Google while the main account language is set to English, and Google again did some changes on the naming etc. I have adjusted the Python Code:

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist title (original)"] # title with lower case t
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith(" videos.csv"), fname # - removed
    playlist_name = fname.removesuffix(" videos.csv") # - removed
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist visibility"] # visibility with lower case v
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

@fkrueger
Copy link

fkrueger commented Jan 7, 2024

Just a quick workaround so people don't need to find this bug report in order to import their playlists again, since it's been 2+ months now.

#4379

edit: Recreated all the sheebang to add a pull request with a feature as per documentation :-)

@d4g
Copy link

d4g commented Jan 17, 2024

Modified the script for the German version and also fixed newline character in windows. If you only need the newline fix:
Replace

    with open(output_dir / f"{playlist_name}.csv", "w") as f:

with

    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:

Whole script:

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        # playlist_name = record["Playlist title (original)"] # title with lower case t
        playlist_name = record["Playlist-Titel (Original)"] # title with lower case t
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "Playlists.csv":
        continue
    assert fname.endswith("-Videos.csv"), fname # - removed
    playlist_name = fname.removesuffix("-Videos.csv") # - removed
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist-Sichtbarkeit"] # visibility with lower case v
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video-ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:
        f.writelines(line + "\n" for line in lines)

@jrmain
Copy link

jrmain commented Feb 27, 2024

With a few modifications, I was able to get the script provided above to work with my playlists.

Well, mostly. I found that importing a long list causes the web server to time out with "504 Gateway Time-out". The timeout (on the inv.n8pjl.ca instance) occurs 60 seconds after the import begins. The playlist does get partially imported, creating an Invidious playlist with 256 entries out of the 267 in the source list. Update: I tried a few more, and it's definitely a hard time limit of 60 seconds, and nothing to do with the number of entries in the playlist.

Here's my version (used with Python 3 on Windows):

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()
print(f"Input direcctory: {input_dir}")
print(f"Output direcctory: {output_dir}")

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist title (original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
    "Music - trippy, chill, ambient, and otherwise m":"Music - trippy, chill, ambient, and otherwise mellow",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    playlist_name = fname.removesuffix(" videos.csv")
    playlist_name = playlist_name.removesuffix(".csv")
    print(f"Playlist name: {playlist_name}")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f""",,,,"{playlist_name}",,{visibility}""")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"""{video_id},,,,"{playlist_name}",,{visibility}""")
    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:
        f.writelines(line + "\n" for line in lines)

@ghost
Copy link

ghost commented Mar 29, 2024

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why. (the latter part was user error)
Script text

Serious warning: Do not set the output_dir as ~ or else you will lose (almost) everything in the home folder on Linux. I made my mistakes.

@fkrueger
Copy link

My workaround from the beginning of this year works for the most part.
No "~" problem there ;-)

@Huddeij
Copy link

Huddeij commented Apr 18, 2024

How abourt someone of the devs adjusts the playlist import csv function after over 8 months of this? The python script is great and all, but I think, it is getting past the time, where we still should use a hack to import playlist, isn't it?

Please, update the invidious' playlist import from csv function to the latest csv scheme of googles export automation

@fkrueger
Copy link

Since the workaround I submitted at the beginning of January 2024 was marked as "uncompleted", now as "stale" while commenting on the pull request is unavailable, I wonder..

Just what is the problem with the patch?

I just recompiled and patched it from scratch. Used the current google takeout format (in english!) and it still works as beautifully as before.

Can any of the more knowing people please point me to why the patch is "uncompleted" as the tag says.. and more importantly, what I can do about it?

Thanks! :-)

@jlj2
Copy link

jlj2 commented Jun 16, 2024

A solution that worked for me is given below.

Developers, please note first that the second script, given above by @Ajimaru , worked for me for some playlists apparently whenever the script was able to change the filename ending from -video.csv to .csv. However, most playlists failed to get updated. What I did: the Google takeout folder of playlists was first cleared of all files except .csv and .CSV files; dictionary.csv was cleared also. The script was first able to modify some of the playlists correctly, as stated (apparently, those whose filename endings were somehow changed by the script from -video.csv to .csv). I later removed these successful playlists. The script given above by Ajimaru , which I called YoutubePlaylistsToModifyForInvidiousWithNewCodeThatWorks, was placed in the parent folder (../), given execution rights (chmod +x YoutubePlaylistsToModifyForInvidiousWithNewCodeThatWorks), and it was told to send the new playlists to a new folder at /tmp/TestNewPlaylistsWithHyphenVideosRemoved, and it gave the following error (my home folder name and folder name in Documents were modified in this report):

$ ../YoutubePlaylistsToModifyForInvidiousWithNewCodeThatWorks . /tmp/TestNewPlaylistsWithHyphenVideosRemoved/
Traceback (most recent call last):
  File "/home/myhomefolder/Documents/IT - youtube/takeout-20240614T172347Z-001 - playlists only perhaps/takeout-20240614T173259Z-001 - playlists perhaps only/../YoutubePlaylistsToModifyForInvidiousWithNewCodeThatWorks", line 26, in <module>
    playlist_name = record["Playlist title (original)"] # title with lower case t
                    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'Playlist title (original)'

An alternative solution that worked for me: one might have success making Invidious to import the playlists as they are currently being exported from Youtube with the following modifications performed on each playlist, one at a time:

  1. Create a new folder where your newly formatted playlists will be kept.

  2. Copy over those playlists there.

  3. Rename each playlist's filename by changing endings from -video.csv to .csv (optional).

  4. The first line in each playlist should be changed from:

Video ID,Playlist Video Creation Timestamp

To:

Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility
,,,,PlaylistNameGoesHere,,Private

Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility

In the second line, change PlaylistNameGoesHere to your playlist's name; there is no need to mention .csv in that line because this field refers to the playlist name you gave it, not the playlist's filename. If your playlist name you gave it had spaces, use them without using quotation marks. For example:

Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility
,,,,My playlist name,,Private

Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility
  1. In your file manager, right-click on the playlist and then choose to open that playlist file in Libreoffice Writer. This is a simple way to edit the file although there could be ways of editing it in Libreoffice Calc, etc.

  2. In Libreoffice Writer, we will modify the file contents to look the way they had perhaps originally been formatted in earlier instances: run a search for the following, with 'Regular expressions' ticked.
    Find:
    ,2........................

    Replace all instances with:
    ,,,,,,

    Presumably, if some video was added (or created?) to the playlist before the year 2000, you could experiment by searching for the following (because the search above was for videos added with a datestamp from 2000 onwards):
    ,1........................
    and replacing all with ,,,,,, as above.

  3. Save the file, but do not accept the offer to change the file format to ODF format or any other format; just accept the offer to 'Use Text Format', its original format.

  4. You can then import the playlist in your invidious instance. For example, once you are logged in to your invidious homepage, you may find links to 'SUBSCRIPTIONS' and 'PLAYLISTS'. Click on the latter; then click on 'Import/export'. Then, next to 'Import YouTube playlist (.csv)', choose your newly formatted playlist file, and click below on 'Import'. Give your browser a few moments to upload your file. If all works well, you could next find your playlist listed in the previously opened 'PLAYLISTS' page. This worked for me in one Invidious instance today.

Note that these playlists also displayed correctly in Playlet when tried, which is an app/'channel' to watch Youtube without ads in Roku that uses Invidious as a backend, including playlists and subscription lists exported to Invidious:
https://channelstore.roku.com/details/840aec36f51bfe6d96cf6db9055a372a/playlet
https://www.reddit.com/r/selfhosted/comments/zb07p0/playlet_an_invidious_frontend_for_roku_tv/

@Nardo86
Copy link

Nardo86 commented Oct 25, 2024

Modified version for italian takeout https://github.com/Nardo86/VarieEdEventuali/blob/main/InvidiousYTPlaylistConverter, I just finished converting all my youtube music playlist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:imports Account data import/export
Projects
None yet
Development

No branches or pull requests