Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to call out to external websites #141

Open
markwilkinson opened this issue Mar 5, 2024 · 11 comments
Open

Not possible to call out to external websites #141

markwilkinson opened this issue Mar 5, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@markwilkinson
Copy link

markwilkinson commented Mar 5, 2024

Description

In both my own jupyterlite, and in the demo jupyterlite, it is not possible to call out to external websites. It always results in an error related to insecure requests. This happens with all URLs that I have tested, and happens whether or not the request call includes a "validate=true/false" flag.

Reproduce

  1. Code block:
import requests

def download_file_into_memory(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.content
    else:
        print(f"Failed to download file. Status code: {response.status_code}")
        return None


file_content = download_file_into_memory("https://cnn.com")
  1. Run

  2. See error:

/lib/python3.11/site-packages/urllib3/connectionpool.py:1101: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cnn.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
---------------------------------------------------------------------------
JsException                               Traceback (most recent call last)
File /lib/python3.11/site-packages/urllib3/contrib/emscripten/fetch.py:380, in send_request(request)
    378         js_xhr.setRequestHeader(name, value)
--> 380 js_xhr.send(to_js(request.body))
    382 headers = dict(Parser().parsestr(js_xhr.getAllResponseHeaders()))

JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://cnn.com/'.

During handling of the above exception, another exception occurred:

@markwilkinson markwilkinson added the bug Something isn't working label Mar 5, 2024
@jtpio
Copy link
Member

jtpio commented Mar 6, 2024

@markwilkinson Could it be because cnn.com redirects to edition.cnn.com? Using https://edition.cnn.com/ directly in the code seems to be working fine:

image

@markwilkinson
Copy link
Author

markwilkinson commented Mar 7, 2024

I don't think that's the problem... It seems that https://edition.cnn.com is the exception to the rule! I have added the auto-redirect flag and that doesn't solve the problem for any of the URLs that I want to use. I have also tried using https://github.com and https://google.ca and https://www.cbgp.upm.es (this last one I know for sure does not redirect). I have also tried in two browsers.

None of these work.

So I think the problem is real!

@markwilkinson
Copy link
Author

I have also tried connecting directly to my server rather than the https reverse proxy (http://....) and that also throws an error (different error), but I have a feeling that Jupyter doesn't allow insecure connections anyway, so that might not be informative...??

@markwilkinson
Copy link
Author

Have you had any further thoughts on this? I am still unable to resolve any URL, using the demo jupyterlite, other than the one you discovered that worked (edition.cnn.com). I have also tried starting from a new notebook, running %pip install requests and then trying to reach any website... same problem in all cases.

@markwilkinson
Copy link
Author

Hi again! Have you (or anyone) found a work-around for this? I'm so excited to use jupyterlite, but all of the projects I need it for will be downloading their data from the Web, so... this is a real show-stopper for me!

Advice very welcome!

@epugh
Copy link

epugh commented Apr 18, 2024

Have you tried using fetch... so, this isn't to an external site, but check out these examples of notebooks that I run in jupyterlite: https://github.com/o19s/quepid-jupyterlite/blob/main/jupyterlite/files/examples/Multiple%20Raters%20Analysis.ipynb

Maybe because "fetch" is javascript???

@markwilkinson
Copy link
Author

Thanks for the suggestion! Unfortunately, that didn't work either, and with ~identical symptoms. the "await fetch" fails with "JsException: TypeError: Failed to fetch" for all URLs other than the one we identified at the top of this issue report (https://edition.cnn.com).

So... unless I am interested in what CNN has to say (I'm not), I continue to be out of luck! ;-)

@mrkvn
Copy link

mrkvn commented May 6, 2024

I believe this is because of CORS. I'm not sure but I think there's no way around it. It's a browser security. You can hit a valid API endpoint though. You'd need a server for what you are trying to do. Then your server would be the one who will send an http request to the endpoint you want to hit. You might want to read this posted issue: jupyterlite/jupyterlite#729 (comment)

@markwilkinson
Copy link
Author

Interesting! In most cases, I run the servers that I need to talk to from Jupyter, so I will try reconfiguring them to accept all in CORS. For the other cases, I will try your proxy ideas.

Thanks!! If this is the problem, then I suspect its going to be hard to fix in jupyterlite itself... which is sad! But a proxy is fine.

I'll report back here if this solves the problem. Thanks for the suggestion @mrkvn !

@markwilkinson
Copy link
Author

@mrkvn this did solve the problem. It was necessary also to explicitly install support for https. Now it's all good! Thanks!

@psychemedia
Copy link
Contributor

I've been using a thing of the following form to make simple proxied requests that give me a response object r I can call as r.text, r.content, or r.json()

import requests
from urllib.parse import quote, urlencode

class ProxyResponse:
    def __init__(self, content):
        self._content = content
        
    @property
    def text(self):
        return self._content
        
    def json(self):
        import json
        return json.loads(self._content)
        
    @property
    def content(self):
        return self._content.encode()

def cors_proxy_request(url, params=None):
    """CORS proxy for GET resources with requests-like response."""
    if params:
        full_url = f"{url}?{urlencode(params)}"
    else:
        full_url = url
        
    proxy_url = f"https://corsproxy.io/?{quote(full_url)}"
    response = requests.get(proxy_url).content.decode().strip()
    return ProxyResponse(response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants