Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can source part use selenium driver.page_source? #938

Open
JamesGGGG opened this issue Feb 27, 2025 · 3 comments
Open

Can source part use selenium driver.page_source? #938

JamesGGGG opened this issue Feb 27, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@JamesGGGG
Copy link

Describe the bug
error missing "content"

smart_scraper_graph = SmartScraperGraph(
prompt="summarize information in this page.",

also accepts a string with the already downloaded HTML code

source=driver.page_souce,
config=graph_config
)
result = smart_scraper_graph.run()

error: 'Input to PromptTemplate is missing variables {'"content"'}. Expected: ['"content"', 'question'] Received: ['question']\nNote: if you intended {"content"} to be part of the string and not a variable, please escape it with double curly braces like: '{{"content"}}'.\nFor troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/INVALID_PROMPT_INPUT '

Other way:
html_file_path = self.save_page_source(headless_driver, "information")
with open(html_file_path, 'r', encoding='utf-8') as file:
html_content = file.read()
smart_scraper_graph = SmartScraperGraph(
prompt="summarize information in this page.",

also accepts a string with the already downloaded HTML code

source=html_content,
config=graph_config
)

result = smart_scraper_graph.run()

error remains the same

Desktop (please complete the following information):

  • OS: mac os
  • Browser chrome
  • Version

Additional context
I created a selenium driver to get a url, after login, click some buttons, get to a page, I hope scrapegraphai can help me scrape informations in the page.

Copy link

dosubot bot commented Feb 27, 2025

Hey @JamesGGGG! I'm Dosu, a bot here to help you with bugs, questions, and becoming a contributor. I'll assist you while you wait for a human maintainer. 😊

To resolve the 'Input to PromptTemplate is missing variables {"content"}' error when using SmartScraperGraph with Selenium's driver.page_source or a saved HTML file, you can try downgrading the scrapegraphai library to version 1.37.0. This has been suggested as a solution for similar issues. You can do this by running the following command:

pip install scrapegraphai==1.37.0

Additionally, ensure that the source parameter is correctly receiving the HTML content and that all required variables are provided to the PromptTemplate [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the bug Something isn't working label Feb 27, 2025
@JamesGGGG
Copy link
Author

It worked. Another question, I want the scraped information to fill a data class, can it is possible or what kind of action should I do?

@VinciGit00
Copy link
Collaborator

what do you mean by information to fill a data class? you can specify an output schema If you want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants