Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] A way to prevent SIGINT (cmd+c) being passed to the browser processes so that they can be shutdown cleanly. #1170

Open
samwillis opened this issue Feb 24, 2022 · 13 comments

Comments

@samwillis
Copy link

I have been using Playwright with the Scrapy web scraping framework, this is the plugin: https://github.com/scrapy-plugins/scrapy-playwright

Scrapy is designed to cleanly shutdown on SIGINT, saving its crawl state so that it can be resumed. When used with Playwright the SIGINT immediately closes the browsers (with both handle_sigint=False or True), this results in the currently processing pages crashing and the shutdown failing with the state not being saved. The ideal way to fix this is to prevent the SIGINT from being passed to the browsers so that scrapy-playwright can handle their clean shutdown as active pages finish being processed.

I have successfully made this work on posix by monkey patching Playwright based on this from stackoverflow to include a preexec_fn:

# Monkey patch Playwright
from playwright._impl._transport import PipeTransport, _get_stderr_fileno

def new_pipe_transport_connect_preexec(): # Don't forward signals.
    os.setpgrp()

async def new_pipe_transport_connect(self):
    self._stopped_future: asyncio.Future = asyncio.Future()
    # Hide the command-line window on Windows when using Pythonw.exe
    creationflags = 0
    if sys.platform == "win32" and sys.stdout is None:
        creationflags = subprocess.CREATE_NO_WINDOW

    try:
        # For pyinstaller
        env = os.environ.copy()
        if getattr(sys, "frozen", False):
            env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")

        self._proc = await asyncio.create_subprocess_exec(
            str(self._driver_executable),
            "run-driver",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=_get_stderr_fileno(),
            limit=32768,
            creationflags=creationflags,
            env=env,
            preexec_fn=new_pipe_transport_connect_preexec,  # My new line!!
        )
    except Exception as exc:
        self.on_error_future.set_exception(exc)
        raise exc

    self._output = self._proc.stdin

PipeTransport.connect = new_pipe_transport_connect

What would be ideal is either a flag that could be set to enable this (for both Posix and Windows) or a way to pass a custom preexec_fn when starting a browser.

See the related bug on scrapy-playwright scrapy-plugins/scrapy-playwright#62

@zidokobik
Copy link

Having the same issue where SIGINT closes the browser even with handle_sigint=False. I have a long running program that would appreciate graceful shutdown, I do so with loop.add_signal_handler() in asyncio, but SIGINT still kills the browser process.

@Rafiot
Copy link

Rafiot commented Jan 21, 2023

Not as handy as SIGINT, but I use SIGTERM and it works doesn't kill the browser: https://github.com/ail-project/lacus/blob/main/bin/capture_manager.py#L66 & https://github.com/ail-project/lacus/blob/main/bin/stop_capture_manager.py

@blksith0
Copy link

while True:
  do_stuff(playwright)
  sleep(long_time)

Look at this beautiful loop. I cancel it with Ctrl+C, then I handle the KeyboardInterrupt exception.
Except Ctrl+C messes with the playwright browser. That's not nice.
Alright, let's do handle_sigint=False, handle_sigterm=False
Alas, to no avail. God please send divine knowledge and wisdom to my python code.

@Rafiot
Copy link

Rafiot commented Jan 26, 2023

@blksith0 as far as I can tell, it is a bug in Playwright, you cannot solve it on your side.

If I may, I'd recommend you to use SIGTERM instead of SIGINT (as I explained in the message above), and it then works fine. But you cannot use Ctrl+C.

@blksith0
Copy link

@Rafiot Yes, I see that. I'm just not smart enough to know how I can send a SIGTERM to my program and use that to stop the loop.

@blksith0
Copy link

blksith0 commented Mar 6, 2023

It would be really great if I could terminate my loop and not have it kill the playwright browser.
Can you please look into this?

@ustulation
Copy link

While I'm aware what I'm going to write is not an exact solution to the mentioned problem, I've got a slightly different perspective due to the "other" stuff going on with playwright. Exactly like yous I needed these feature, because I wanted to handle playwright error separately and do something except for ctrl-c etc (mainly SIGINTs and SIGTERMs). I would expect them to generate asyncio.CancelledError which doesn't derive from Exception and is thus not accidentally caught unless you wanted to, and hence ending the application gracefully. That doesn't happen because while every other await gets the cancelled-error, awaits on playwright stuff gets a normal playwright-error about target being closed and so on.

However, playwright has another bug (hopefully soon to be solved after 2.5 yrs of it being an open issue) - it mem-leaks. It's a big thread but I face most problems mentioned there for long running applications. The TL;DR from there is that the browser will unexpectedly close after certain num of context-creates and destroys and mem just keeps rising (so it sometimes closes and sometimes crashes with the app with OOM in dockers/fargates but amounts to the same thing - the browser's gone).

Which means that the browser close not only happens on ctrl-c etc. but can happen due to other bugs like above. So instead of waiting for the solution from here (for ctrl-c to not raise playwright error due to browser close), I ended up doing the less desirable thing of catching playwright error, inspecting the string message and if it's anything like regex r'Target.*close.*' (because there seems to be some permutations like Target closed., Target page, context or browser closed. and so on) then raise the asyncio.CancelledError myself from that exception.

Ofc this may not be suitable for everyone as their use case might be different, but for someone who otherwise wants to do something separate for playwright-errors except browser close (e.g you might want to backoff and retry if doing a web-scrape) and interpret browser-close as an unexpected irrecoverable scenario, then this is perhaps the most straightforward solution (unless somebody knows a better way ofc).

@Rafiot
Copy link

Rafiot commented Oct 4, 2023

@ustulation that's also what I'm doing. In case you want a pretty complete list of possible errors, I have one there: https://github.com/Lookyloo/PlaywrightCapture/blob/main/playwrightcapture/capture.py#L837

@KRRT7
Copy link
Contributor

KRRT7 commented Apr 12, 2024

following.

@telecran-telecrit
Copy link

telecran-telecrit commented Apr 29, 2024

Still not fixed! Can not stop playwright correctly after KeyboardInterrupt, and infinite awation produced.

    finally:
        print('Closing...')
        _BLOCK_IMAGES = _BLOCK_IMAGES_old
        try:
            if (context is not None):
                await context.close()
                context = None
            if (playwright is not None):
                await playwright.stop()
                playwright = None
            print(context, playwright)
        except:
            pass
        print('Done.[Closing]')

@junyeong-huray
Copy link

Hello folks,

Here's an OS tip related to this issue.

Have you heard about PGID (Process Group ID)?
When you execute your program in a terminal, and your program fork()-ed
multiple child processes, than all of the processes have the same PGID unless
you change it explicitly.

The OS kernel knows your processes are running at the foreground of the
terminal and it also knows the PGID of them.

IF you type Ctrl+c in a terminal, the terminal driver of the OS kernel sends
SIGINT signals to all processes that belong to the PGID.

If your program imports playwright package, and it creates playwright
instances, some of playwright scripts are executed in a child process, and
browser process runs in a child process also.

In this setup, if you hit Ctrl+c, then all of these processes receive SIGINT
signal simultaneously. Even if your main program handles SIGINT signal and does
some cleanup including shutting down playwright, you can not stop SIGINT not to
signal the child processes.

So in this case, just send SIGTERM to your main process instead of hitting
ctrl+c in the terminal.

I hope this tip helps.

@ezyang
Copy link

ezyang commented Feb 4, 2025

Note that if you want to use the monkeypatch, the latest version of playwright has slightly different code for launching the driver:

            executable_path, entrypoint_path = compute_driver_executable()
            self._proc = await asyncio.create_subprocess_exec(
                executable_path,
                entrypoint_path,
                "run-driver",

Also, annoyingly, when you stop sending the signal to the browser, sometimes the event loop is unable to close itself in a timely manner (it takes 10s of seconds to actually close).

@BoPeng
Copy link

BoPeng commented Feb 5, 2025

Just keep myself in the loop for the issue here. I have the same issue when

try:
   browser = ...
   nice_exit()
except KeyboardInterruption
   harsh_exit()
finally:
   browser.close()

triggers long error messages when the program is terminated by Ctrl-C. It looks like when KeyboardInterruption happens, the browser are killed instantly, leaving the browser in a non-functional status and exceptions from browser.close().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests