You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How do I run scrapy splash on a virtual machine with linux? Essentially, I have a lua script that requires me to send keys onto a site to log in and then scrape it.
I have installed docker however I cannot seem to get the scraper to work as it won't connect to the server.
Are there any simple steps that I can follow to get this to work on a VM? Like what should I install, and what should I do next before running scrapy crawl spider.
As for docker, I have implemented the following whilst in admin mode:
docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600
However this is currently running and I'd like it to run in on the background. I cannot seem to figure this out; I have tried:
docker run -d 8050:8050 scrapinghub/splash --max-timeout 3600
But I just get the error:
Unable to find image '8050:8050' locally
I believe this may solve my issue or perhaps not and I need some further installations. Please let me know! I really need expert guidance to figure this out.
I have opened another instance whilst docker was running on the first instance.
I get the following error when running the scrapy crawler:
2022-02-16 02:55:26 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info':
{'type': 'JS_ERROR', 'js_error_type': 'TypeError', 'js_error_message': 'null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'js_error':
'TypeError: null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'message': '[string "..."]:12: error during JS function call: \'TypeEr
ror: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\'', 'source': '[string "..."]', 'line_number': 12, 'error': 'error during JS
function call: \'TypeError: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\''}}
2022-02-16 02:55:26 [scrapy.core.engine] DEBUG: Crawled (400) <GET http://instagram.com/ via http://localhost:8050/execute> (referer: None)
2022-02-16 02:55:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 http://instagram.com/>: HTTP status code is not handled or not allowed
The scraper works perfectly fine on my mac so there's definitely an installation that I am missing somewhere.
The text was updated successfully, but these errors were encountered:
How do I run scrapy splash on a virtual machine with linux? Essentially, I have a lua script that requires me to send keys onto a site to log in and then scrape it.
I have installed docker however I cannot seem to get the scraper to work as it won't connect to the server.
Are there any simple steps that I can follow to get this to work on a VM? Like what should I install, and what should I do next before running
scrapy crawl spider
.As for docker, I have implemented the following whilst in admin mode:
However this is currently running and I'd like it to run in on the background. I cannot seem to figure this out; I have tried:
But I just get the error:
I believe this may solve my issue or perhaps not and I need some further installations. Please let me know! I really need expert guidance to figure this out.
I have opened another instance whilst docker was running on the first instance.
I get the following error when running the scrapy crawler:
The scraper works perfectly fine on my mac so there's definitely an installation that I am missing somewhere.
The text was updated successfully, but these errors were encountered: