You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a personal project where I need to retrieve the title and HTML content of a webpage (a simple task).
Sometimes, the URL I visit has protections like cookies, but the HTML content is already fully loaded, so I don’t actually care about the cookie. All the information I need is in the HTML.
Here’s my problem:
When I try to evaluate the title of the page, I often get a timeout error.
To handle this, I retry the process after the first failure, but I can’t do more than that ?.
What’s confusing is that if I manually check the title in the browser’s console, it works perfectly. However, when I try to retrieve the title programmatically in PHP, it doesn’t work.
Does anyone know why this might happen or how I can fix it?
if ($urls) {
/** @var UrlScrapedByKeyword[] $urls */
foreach ($urls as $key => $url) {
$urlScrapped = ltrim($url->getUrl(), './');
// $urlScrapped = $urlScrapped;
$browser = $this->createBrowser();
$page = $browser->createPage();
$html = false;
try {
$page->navigate($urlScrapped, ['strict'])->waitForNavigation(Page::INTERACTIVE_TIME, 6000);
$page->evaluate("console.log('document.title')");
// -> here where my code crash so i catch the error below
$pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();
if ($pageTitle == 'Before you continue')
{
$this->AcceptGoogleCookies($page);
$pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();
}
echo($pageTitle.' from normal way');
$pageContent = $page->getHtml(2500);
sleep(1);
if ($pageContent) echo('content OK');
if ($pageTitle == 'Before you continue') $this->AcceptGoogleCookies($page);
} catch (OperationTimedOut $e) {
// Here in the console of the navigator, i can see this operation work correctly
$page->evaluate("console.log(document.title)");
// !!----catch the error and retry to evaluate title but again crash ----!!
$pageTitle = $page->evaluate('document.title')->getReturnValue();
if ($pageTitle == 'Before your continue') $this->AcceptGoogleCookies($page);
echo $pageTitle.' from error';
$pageContent = $page->getHtml(2500);
sleep(1);
if ($pageContent) echo('content OK from error');
} catch (NavigationExpired $e) {
echo "Erreur de NavigationExpired lors de l'évaluation du titre : $pageTitle</br>";
}
}`
The text was updated successfully, but these errors were encountered:
Hi community,
I'm working on a personal project where I need to retrieve the title and HTML content of a webpage (a simple task).
Sometimes, the URL I visit has protections like cookies, but the HTML content is already fully loaded, so I don’t actually care about the cookie. All the information I need is in the HTML.
Here’s my problem:
Thanks for your help!
Here is my code
` $urls = $urlScrapedByKeyWordRepository->findBy(['isUsedForGeneration' => false]);
shuffle($urls);
$urls = array_slice($urls, 0, 2);
The text was updated successfully, but these errors were encountered: