Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_key should be dependent on chosen_algorithm #53

Closed
xmcp opened this issue Oct 5, 2024 · 10 comments · Fixed by #54
Closed

cache_key should be dependent on chosen_algorithm #53

xmcp opened this issue Oct 5, 2024 · 10 comments · Fixed by #54

Comments

@xmcp
Copy link
Contributor

xmcp commented Oct 5, 2024

Currently the after_request function finds the cached response through key = self.cache_key(request).

However, this key does not consider chosen_algorithm, so if a request wants brotli, the brotli response body will be cached, and following gzip requests will be incorrectly served with brotli body.

Step to reproduce:

from flask import *
from flask_compress import Compress
from flask_caching import Cache

# --- below are copied from README ---

app = Flask(__name__)

cache = Cache(app, config={
    'CACHE_TYPE': 'simple',
    'CACHE_DEFAULT_TIMEOUT': 60*60  # 1 hour cache timeout
})

# Define a function to return cache key for incoming requests
def get_cache_key(request):
    return request.url

# Initialize Flask-Compress
compress = Compress()
compress.init_app(app)

# Set up cache for compressed responses
compress.cache = cache
compress.cache_key = get_cache_key

# --- above are copied from README ---

@app.route('/')
def index():
    return 'hello'*100000

app.run('127.0.0.1', 5000)
$ curl http://127.0.0.1:5000 -H "Accept-Encoding: br"
...
$ curl http://127.0.0.1:5000 --compressed -v
*   Trying 127.0.0.1:5000...
* Connected to 127.0.0.1 (127.0.0.1) port 5000
> GET / HTTP/1.1
> Host: 127.0.0.1:5000
> User-Agent: curl/8.7.1
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Server: Werkzeug/3.0.1 Python/3.12.2
< Date: Sat, 05 Oct 2024 20:46:08 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 17
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Connection: close
<
* Error while processing content unencoding: incorrect header check
* Closing connection
curl: (61) Error while processing content unencoding: incorrect header check
@alexprengere
Copy link
Collaborator

alexprengere commented Oct 6, 2024

Thanks for digging into this. I think this can (should?) also be solved by more carefully setting the get_cache_key. Using this solves the issue:

def get_cache_key(request):
    # EDIT: the cache key mush always be a string
    return f"{request.url};{request.headers.get('Accept-Encoding')}"

The Accept-Encoding header will be directly translated to chosen_algorithm once _choose_compress_algorithm is called, so the end result is the same as your PR #54 .

I agree the current situation needs to be improved, either by merging your PR, or updating the docs with better examples and warn about this pitfall.

@xmcp
Copy link
Contributor Author

xmcp commented Oct 6, 2024

Setting the cache key based on Accept-Encoding may cause redundancy in the cache, since different values of Accept-Encoding may lead to the same chosen_algorithm. This is not a huge problem, but I will argue that my PR is the more technically correct.

@alexprengere
Copy link
Collaborator

I am inclined to agree. Note that in practice, there are not many Accept-Encoding values out there, as this set "statically" set per-browser.

@randomcascade
Copy link

Hope I'm not dogpiling, but I 100% agree that the solution that @xmcp provided is more elegant. I've implemented the cache key trick mentioned above and while it works, the new fix is highly desirable for those of us using both flask-caching and flask-compress in production. Huge kudos.

@alexprengere
Copy link
Collaborator

alexprengere commented Oct 11, 2024

After more testing of this, it turns out releasing this would have broken quite a lot of things (for starters, pretty much all my projects 😉).

Flask-Caching relies on cachelib. If you look at the documentation of cache.get, you will see that the key must be of str type.

It does not show in the example above, as the default simple cache relies on a dict, and using a tuple as key will "just work", even though this will not pass static type checking.

The trouble begins when you use another cache type, personally I rely often on FileSystemCache, as it allows flask applications served by multiple processes to share a common cache. If you try this, it will fail with the above patch, as there is code that builds a path from the key, and this fails if the key is not a str.

The error I got when testing this was not obvious and did not match the raise TypeError from cachelib:

UnboundLocalError: cannot access local variable 'bkey_hash' where it is not associated with a value

I was a bit unlucky here, as the clear error in cachelib when the key is not a str was only added in version 0.10, and the latest release of flask-caching still pins cachelib<0.10.

So the next steps:

  • revert the PR (done)
  • change the patch so that the key is always a str
  • definitely add a test case for this, involving FileSystemCache

In hindsight I should have required a test before merging, even though on simple cache it would have passed.

@xmcp
Copy link
Contributor Author

xmcp commented Oct 11, 2024

Oops sorry for that. I only tested it with simple cache.

@alexprengere
Copy link
Collaborator

@xmcp are you planning to open another PR?

@xmcp
Copy link
Contributor Author

xmcp commented Oct 11, 2024

I am busy with other things in the following days. I am okay with minor changes like casting the type to str, but I won't have time to contribute test cases anyway. You can fix this issue if you like to.

@alexprengere
Copy link
Collaborator

Fixed by e3aeec8

@alexprengere
Copy link
Collaborator

For the record release 1.16 shipped with that fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants