You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Description
I do not understand why scrape_url() supports headers for authenticating against websites, but extract() does not. I either need to use extract() with auth headers, or grab the data with scrape_url() and headers, and then pass it into the extract feature to format the data.
Even with scrape_url() passing the auth headers from a logged in browser then trips a captcha and we have seemingly no recourse. Is there some way that we can guard against fingerprinting? Could the captcha be returned via the API for us to solve, if the captcha is inevitable?
Proposed Feature
Add headers to ExtractParams please.
Alternatives Considered
There are not many things I can do about it.
Implementation Suggestions
Would it be any more complicated than just adding the headers param and passing it through the scraper engine and attaching the headers to the scraper requests?
Use Case
A site with authentication that needs to be scraped. This is just a subset of the actual issue that Firecrawl does not support logging in and that the agent is generally disobedient. You cannot ask the LLM in the prompt to enter your username and password and login. It will not even obey things like 'From this URL click on the first item in the table on the page under the column 'Name'. It would be helpful if there were a verbose debugging mode where we could have some visibility on what was happening behind the scenes.
Then there is no prompt available in scrape_url() so I cannot use scrape_url() for this.
Thanks for hearing me out.
Additional Context
N/A
The text was updated successfully, but these errors were encountered:
Problem Description
I do not understand why
scrape_url()
supports headers for authenticating against websites, butextract()
does not. I either need to useextract()
with auth headers, or grab the data withscrape_url()
and headers, and then pass it into the extract feature to format the data.Even with
scrape_url()
passing the auth headers from a logged in browser then trips a captcha and we have seemingly no recourse. Is there some way that we can guard against fingerprinting? Could the captcha be returned via the API for us to solve, if the captcha is inevitable?Proposed Feature
Add headers to
ExtractParams
please.Alternatives Considered
There are not many things I can do about it.
Implementation Suggestions
Would it be any more complicated than just adding the headers param and passing it through the scraper engine and attaching the headers to the scraper requests?
Use Case
A site with authentication that needs to be scraped. This is just a subset of the actual issue that Firecrawl does not support logging in and that the agent is generally disobedient. You cannot ask the LLM in the prompt to enter your username and password and login. It will not even obey things like 'From this URL click on the first item in the table on the page under the column 'Name'. It would be helpful if there were a verbose debugging mode where we could have some visibility on what was happening behind the scenes.
Then there is no
prompt
available inscrape_url()
so I cannot usescrape_url()
for this.Thanks for hearing me out.
Additional Context
N/A
The text was updated successfully, but these errors were encountered: