Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What can i do when i have operation TimeOut ? #679

Open
karimWorldSpace opened this issue Jan 26, 2025 · 0 comments
Open

What can i do when i have operation TimeOut ? #679

karimWorldSpace opened this issue Jan 26, 2025 · 0 comments

Comments

@karimWorldSpace
Copy link

karimWorldSpace commented Jan 26, 2025

Hi community,

I'm working on a personal project where I need to retrieve the title and HTML content of a webpage (a simple task).

Sometimes, the URL I visit has protections like cookies, but the HTML content is already fully loaded, so I don’t actually care about the cookie. All the information I need is in the HTML.

Here’s my problem:

  1. When I try to evaluate the title of the page, I often get a timeout error.
  2. To handle this, I retry the process after the first failure, but I can’t do more than that ?.
  3. What’s confusing is that if I manually check the title in the browser’s console, it works perfectly. However, when I try to retrieve the title programmatically in PHP, it doesn’t work.
  4. Does anyone know why this might happen or how I can fix it?

Thanks for your help!

Here is my code

` $urls = $urlScrapedByKeyWordRepository->findBy(['isUsedForGeneration' => false]);
shuffle($urls);
$urls = array_slice($urls, 0, 2);

    if ($urls) {
        /** @var UrlScrapedByKeyword[] $urls */
        foreach ($urls as $key => $url) {
            $urlScrapped = ltrim($url->getUrl(), './');
           // $urlScrapped =  $urlScrapped;

            $browser = $this->createBrowser();
            $page = $browser->createPage();
            $html = false;

            try {
                $page->navigate($urlScrapped, ['strict'])->waitForNavigation(Page::INTERACTIVE_TIME, 6000);
                $page->evaluate("console.log('document.title')");

               // -> here where my code crash so i catch the error below
                $pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();

                if ($pageTitle == 'Before you continue')
                {
                    $this->AcceptGoogleCookies($page);
                    $pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();
                } 

                echo($pageTitle.' from normal way');
                $pageContent = $page->getHtml(2500);
                sleep(1);
                if ($pageContent) echo('content OK');

                if ($pageTitle == 'Before you continue') $this->AcceptGoogleCookies($page);
            } catch (OperationTimedOut $e) {
                // Here in the console of the navigator, i can see this operation work correctly
               $page->evaluate("console.log(document.title)");

                // !!----catch the error and retry to evaluate title but again crash ----!!
                $pageTitle = $page->evaluate('document.title')->getReturnValue(); 

                if ($pageTitle == 'Before your continue') $this->AcceptGoogleCookies($page);

                echo $pageTitle.' from error';
                $pageContent = $page->getHtml(2500);
                sleep(1);
                if ($pageContent) echo('content OK from error');
            } catch (NavigationExpired $e) {
                echo "Erreur de NavigationExpired lors de l'évaluation du titre : $pageTitle</br>";
            }
        }`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant