Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zyte-" headers filtered httpResponsebody #253

Merged
merged 8 commits into from
Mar 20, 2025

Conversation

GeorgeA92
Copy link
Contributor

"zyte-" headers filtered from httpResponsebody / customHttpRequestHeaders

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.75%. Comparing base (2248ceb) to head (c0cbc41).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #253      +/-   ##
==========================================
+ Coverage   97.68%   97.75%   +0.06%     
==========================================
  Files          14       14              
  Lines        1728     1778      +50     
  Branches      316      333      +17     
==========================================
+ Hits         1688     1738      +50     
  Misses         18       18              
  Partials       22       22              
Files with missing lines Coverage Δ
scrapy_zyte_api/_params.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Gallaecio Gallaecio requested review from kmike and wRAR March 3, 2025 08:29
@Gallaecio
Copy link
Contributor

This only drops headers during mapping from Request.headers. Should we also detect and drop such headers in customHttpRequestHeaders? And if so, should we also automatically map them to HTTP API parameter? Should we have a different behavior when using zyte_api instead of zyte_api_automap?

@GeorgeA92
Copy link
Contributor Author

@Gallaecio

This only drops headers during mapping from Request.headers. Should we also detect and drop such headers in customHttpRequestHeaders?

As far as I understand _iter_headers where check condition updated affects both Request.headers and customHttpRequestHeaders as _iter_headers called on both _map.. calls here:

def _map_request_headers(
*,
api_params: Dict[str, Any],
request: Request,
browser_headers: Dict[bytes, str],
browser_ignore_headers: SKIP_HEADER_T,
):
request_headers = {}
for k, lowercase_k, v in _iter_headers(
api_params=api_params,
request=request,
header_parameter="requestHeaders",
):
key = browser_headers.get(lowercase_k)

and here:

def _map_custom_http_request_headers(
*,
api_params: Dict[str, Any],
request: Request,
skip_headers: SKIP_HEADER_T,
):
headers = []
for k, lowercase_k, v in _iter_headers(
api_params=api_params,
request=request,
header_parameter="customHttpRequestHeaders",
):
if skip_headers.get(lowercase_k) in (ANY_VALUE, v):
continue
headers.append({"name": k.decode(), "value": v.decode()})
if headers:
api_params["customHttpRequestHeaders"] = headers

@Gallaecio
Copy link
Contributor

Those calls are to map both the requestHeaders (browser) and customHttpRequestHeaders (HTTP) fields. But if you manually set customHttpRequestHeaders on a request, e.g. "zyte_api_automap": {"customHttpRequestHeaders": […]}, those will not be parsed, they are taken as is, no automatic header mapping is done, no header dropping, and a warning is logged encouraging the use of Request.headers instead.

@GeorgeA92
Copy link
Contributor Author

@Gallaecio

But if you manually set customHttpRequestHeaders on a request, e.g. "zyte_api_automap": {"customHttpRequestHeaders": […]}, those will not be parsed, they are taken as is, no automatic header mapping is done, no header dropping, and a warning is logged encouraging the use of Request.headers instead.

On my attempt to reproduce this I got 400 Bad request (while indeed this request passed middleware/handler checks)

Then _map_custom_http_request_headers call needs to be moved or called again somewhere on handler side (right before request will be sent) to cover values received from automap too.

@Gallaecio Gallaecio merged commit 861d597 into scrapy-plugins:main Mar 20, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants