Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while loading data using joblib and import errors of the dependent fiels for models #376

Open
Rupesh-Darimisetti opened this issue Apr 14, 2024 · 5 comments

Comments

@Rupesh-Darimisetti
Copy link

The repository not unable to load the basic import files used in model from the tools path and even appending with

sys.path.append(".")

it resolves importing error of

fromtools.email_preprocess import preprocess

after which it throwing error with joblib loading files as

python naive_bayes/nb_author_id.py 
Traceback (most recent call last):
  File "D:\02_learning\udacity\machine_learning\ud120-projects\naive_bayes\nb_author_id.py", line 24, in <module>
    features_train, features_test, labels_train, labels_test = preprocess()
                                                               ^^^^^^^^^^^^
  File "D:\02_learning\udacity\machine_learning\ud120-projects\tools\email_preprocess.py", line 31, in preprocess
    word_data = joblib.load(open("tools/word_data.pkl","rb"))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\02_learning\udacity\machine_learning\ud120-projects\venv\Lib\site-packages\joblib\numpy_pickle.py", line 648, in load
    obj = _unpickle(fobj)
          ^^^^^^^^^^^^^^^
  File "D:\02_learning\udacity\machine_learning\ud120-projects\venv\Lib\site-packages\joblib\numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
          ^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\pickle.py", line 1337, in load_string
    raise UnpicklingError("the STRING opcode argument must be quoted")
_pickle.UnpicklingError: the STRING opcode argument must be quoted
@Rupesh-Darimisetti
Copy link
Author

@jaycode @13rac1 @nmb10 @richardkalehoff please try to fix the codebase in order to run the models of machine learning.

@Rupesh-Darimisetti Rupesh-Darimisetti changed the title Error while loading data using jobpickle and import errors of the dependent fiels for models Error while loading data using joblib and import errors of the dependent fiels for models Apr 14, 2024
@jaycode
Copy link
Contributor

jaycode commented Apr 18, 2024

Hi @Rupesh-Darimisetti, the code was set up to use Python 3.6.3. It looks like you are using 3.11, which rendered the unpickle feature to fail. I suggest creating a new virtual environment to run it.

@Rupesh-Darimisetti
Copy link
Author

Even though i use python 3.6.8 version and reinstall the venv it's throwing the following error @jaycode

 python naive_bayes/nb_author_id.py 
Traceback (most recent call last):
  File "naive_bayes/nb_author_id.py", line 24, in <module>  
    features_train, features_test, labels_train, labels_test
 = preprocess()
  File ".\tools\email_preprocess.py", line 34, in preprocess

    word_data = joblib.load(words_file_handler)
  File "D:\02_learning\udacity\machine_learning\ud120-projec
ts\venv\lib\site-packages\joblib\numpy_pickle.py", line 577,
 in load
    obj = _unpickle(fobj)
  File "D:\02_learning\udacity\machine_learning\ud120-projec
ts\venv\lib\site-packages\joblib\numpy_pickle.py", line 506,
 in _unpickle
    obj = unpickler.load()
  File "C:\Users\rupes\AppData\Local\Programs\Python\Python3
6-32\lib\pickle.py", line 1050, in load
    dispatch[key[0]](self)
  File "C:\Users\rupes\AppData\Local\Programs\Python\Python3
6-32\lib\pickle.py", line 1174, in load_string
    raise UnpicklingError("the STRING opcode argument must b
e quoted")
_pickle.UnpicklingError: the STRING opcode argument must be 
quoted
(venv) 

@Keroshi
Copy link

Keroshi commented May 24, 2024

I was also facing the same issue. But after a few stack overflow threads, I found the issue to be the Line separator format for both word_data.pkl and email_author.pkl was the one that brought about the errors. Changing it to LF from CRLF or CR might do the trick.

@mirandaO
Copy link

mirandaO commented Mar 13, 2025

I tried everything that was suggested here, and even I used ChatGPT to try to solve the problem, but after expending a couple of a hours, I gave up. Since I actually want to spend time in learning ML concepts, not spending time in solving configuration or setup issues, I just used a basic solution.

My solution to be able to run the nb_author_id.py script without any library error, was to:
1-use Python 2.7.18
2-go back to the last commit before the Python 3 migration and run:
git checkout 9ee2a2e
3-change requirements.txt line "scipy==0.18.1" to "scipy>=0.18.1" and save changes
4-run C:\Python27\python.exe -m pip install -r requirements.txt
5-run C:\Python27\python.exe nb_author_id.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants