Not possible to call out to external websites #141

markwilkinson · 2024-03-05T13:08:12Z

Description

In both my own jupyterlite, and in the demo jupyterlite, it is not possible to call out to external websites. It always results in an error related to insecure requests. This happens with all URLs that I have tested, and happens whether or not the request call includes a "validate=true/false" flag.

Reproduce

Code block:

import requests

def download_file_into_memory(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.content
    else:
        print(f"Failed to download file. Status code: {response.status_code}")
        return None


file_content = download_file_into_memory("https://cnn.com")

Run
See error:

/lib/python3.11/site-packages/urllib3/connectionpool.py:1101: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cnn.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
---------------------------------------------------------------------------
JsException                               Traceback (most recent call last)
File /lib/python3.11/site-packages/urllib3/contrib/emscripten/fetch.py:380, in send_request(request)
    378         js_xhr.setRequestHeader(name, value)
--> 380 js_xhr.send(to_js(request.body))
    382 headers = dict(Parser().parsestr(js_xhr.getAllResponseHeaders()))

JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://cnn.com/'.

During handling of the above exception, another exception occurred:

The text was updated successfully, but these errors were encountered:

jtpio · 2024-03-06T15:26:45Z

@markwilkinson Could it be because cnn.com redirects to edition.cnn.com? Using https://edition.cnn.com/ directly in the code seems to be working fine:

markwilkinson · 2024-03-07T08:02:13Z

I don't think that's the problem... It seems that https://edition.cnn.com is the exception to the rule! I have added the auto-redirect flag and that doesn't solve the problem for any of the URLs that I want to use. I have also tried using https://github.com and https://google.ca and https://www.cbgp.upm.es (this last one I know for sure does not redirect). I have also tried in two browsers.

None of these work.

So I think the problem is real!

markwilkinson · 2024-03-07T08:03:38Z

I have also tried connecting directly to my server rather than the https reverse proxy (http://....) and that also throws an error (different error), but I have a feeling that Jupyter doesn't allow insecure connections anyway, so that might not be informative...??

markwilkinson · 2024-03-12T14:58:26Z

Have you had any further thoughts on this? I am still unable to resolve any URL, using the demo jupyterlite, other than the one you discovered that worked (edition.cnn.com). I have also tried starting from a new notebook, running %pip install requests and then trying to reach any website... same problem in all cases.

markwilkinson · 2024-04-18T09:57:42Z

Hi again! Have you (or anyone) found a work-around for this? I'm so excited to use jupyterlite, but all of the projects I need it for will be downloading their data from the Web, so... this is a real show-stopper for me!

Advice very welcome!

epugh · 2024-04-18T13:41:39Z

Have you tried using fetch... so, this isn't to an external site, but check out these examples of notebooks that I run in jupyterlite: https://github.com/o19s/quepid-jupyterlite/blob/main/jupyterlite/files/examples/Multiple%20Raters%20Analysis.ipynb

Maybe because "fetch" is javascript???

markwilkinson · 2024-04-22T07:51:13Z

Thanks for the suggestion! Unfortunately, that didn't work either, and with ~identical symptoms. the "await fetch" fails with "JsException: TypeError: Failed to fetch" for all URLs other than the one we identified at the top of this issue report (https://edition.cnn.com).

So... unless I am interested in what CNN has to say (I'm not), I continue to be out of luck! ;-)

mrkvn · 2024-05-06T10:20:43Z

I believe this is because of CORS. I'm not sure but I think there's no way around it. It's a browser security. You can hit a valid API endpoint though. You'd need a server for what you are trying to do. Then your server would be the one who will send an http request to the endpoint you want to hit. You might want to read this posted issue: jupyterlite/jupyterlite#729 (comment)

markwilkinson · 2024-05-22T15:29:50Z

Interesting! In most cases, I run the servers that I need to talk to from Jupyter, so I will try reconfiguring them to accept all in CORS. For the other cases, I will try your proxy ideas.

Thanks!! If this is the problem, then I suspect its going to be hard to fix in jupyterlite itself... which is sad! But a proxy is fine.

I'll report back here if this solves the problem. Thanks for the suggestion @mrkvn !

markwilkinson · 2024-06-03T07:46:15Z

@mrkvn this did solve the problem. It was necessary also to explicitly install support for https. Now it's all good! Thanks!

psychemedia · 2024-10-24T12:05:53Z

I've been using a thing of the following form to make simple proxied requests that give me a response object r I can call as r.text, r.content, or r.json()

import requests
from urllib.parse import quote, urlencode

class ProxyResponse:
    def __init__(self, content):
        self._content = content
        
    @property
    def text(self):
        return self._content
        
    def json(self):
        import json
        return json.loads(self._content)
        
    @property
    def content(self):
        return self._content.encode()

def cors_proxy_request(url, params=None):
    """CORS proxy for GET resources with requests-like response."""
    if params:
        full_url = f"{url}?{urlencode(params)}"
    else:
        full_url = url
        
    proxy_url = f"https://corsproxy.io/?{quote(full_url)}"
    response = requests.get(proxy_url).content.decode().strip()
    return ProxyResponse(response)

markwilkinson added the bug Something isn't working label Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not possible to call out to external websites #141

Not possible to call out to external websites #141

markwilkinson commented Mar 5, 2024 •

edited

Loading

jtpio commented Mar 6, 2024

markwilkinson commented Mar 7, 2024 •

edited

Loading

markwilkinson commented Mar 7, 2024

markwilkinson commented Mar 12, 2024

markwilkinson commented Apr 18, 2024

epugh commented Apr 18, 2024 •

edited

Loading

markwilkinson commented Apr 22, 2024

mrkvn commented May 6, 2024 •

edited

Loading

markwilkinson commented May 22, 2024

markwilkinson commented Jun 3, 2024

psychemedia commented Oct 24, 2024

Not possible to call out to external websites #141

Not possible to call out to external websites #141

Comments

markwilkinson commented Mar 5, 2024 • edited Loading

Description

Reproduce

jtpio commented Mar 6, 2024

markwilkinson commented Mar 7, 2024 • edited Loading

markwilkinson commented Mar 7, 2024

markwilkinson commented Mar 12, 2024

markwilkinson commented Apr 18, 2024

epugh commented Apr 18, 2024 • edited Loading

markwilkinson commented Apr 22, 2024

mrkvn commented May 6, 2024 • edited Loading

markwilkinson commented May 22, 2024

markwilkinson commented Jun 3, 2024

psychemedia commented Oct 24, 2024

markwilkinson commented Mar 5, 2024 •

edited

Loading

markwilkinson commented Mar 7, 2024 •

edited

Loading

epugh commented Apr 18, 2024 •

edited

Loading

mrkvn commented May 6, 2024 •

edited

Loading