pytest hangs while running tests #110

telles-simbiose · 2017-01-11T15:29:21Z

Hello everyone,
Last night I let my test suite running until this morning, but I noticed that it hasn't finished running all tests, looking at htop, I noticed some strange processes running for a really long time, as shown in this screenshot:

Looking at the tests output, I saw that the last ran tests were all ran by the same worker gw2 (there were 4 workers running), as there were 3 processes import sys;exec(eval(sys.stdin.readline())) running for 13+ hours, I think that those 3 workers were just stuck somehow.

The text was updated successfully, but these errors were encountered:

RonnyPfannschmidt · 2017-01-11T16:04:28Z

those processes are indeed xdist slaves,
however with the amount of information you provided we are entirely unable to help

Daenyth · 2017-01-12T19:09:04Z

I don't know if it's related but I've seen parallel runs with xdist hang for a long time on my Jenkins with the most recent output being scheduling tests via LoadScheduling

telles-simbiose · 2017-01-13T10:38:59Z

@RonnyPfannschmidt , what kind of information should I provide? It happened only once, I was not able to reproduce it a second time

RonnyPfannschmidt · 2017-01-13T10:40:52Z

@telles-simbiose all involved packages and versions, also it would help if we could take a look at the testsuite

deadlocks triggered by rare race conditions are not uncommon in distributed systems, and xdist running slaves is a distributed system

pseudotensor · 2018-03-10T03:17:34Z

We commonly (about 1/20 runs) get this issue on jenkins and local machines.

15:56:10 ============================= test session starts ==============================
15:56:10 platform linux -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 -- /****_env/bin/python3.6
15:56:10 cachedir: .cache
15:56:10 rootdir: /opt/****, inifile: setup.cfg
15:56:10 plugins: xdist-1.22.0, forked-0.2, cov-2.4.0
15:56:10 gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I / gw6 I / gw7 I / gw8 I / gw9 I / gw10 I / gw11 I / gw12 I / gw13 I / gw14 I / gw15 I / gw16 I / gw17 I / gw18 I / gw19 I / gw20 I / gw21 I / gw22 I / gw23 I / gw24 I / gw25 I / gw26 I / gw27 I / gw28 I / gw29 I / gw30 I / gw31 I / gw32 I / gw33 I / gw34 I / gw35 I / gw36 I / gw37 I / gw38 I / gw39 I
15:56:10 
[gw0] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw1] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw2] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw3] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw4] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw5] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw6] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw7] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw8] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw9] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw10] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw11] linux Python 3.6.4 cwd: /opt/****
15:56:11 
[gw12] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw13] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw14] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw15] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw16] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw17] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw18] linux Python 3.6.4 cwd: /opt/****
15:56:12 
[gw19] linux Python 3.6.4 cwd: /opt/****
15:56:13 
[gw20] linux Python 3.6.4 cwd: /opt/****
15:56:13 
[gw21] linux Python 3.6.4 cwd: /opt/****
15:56:13 
[gw22] linux Python 3.6.4 cwd: /opt/****
15:56:13 
[gw23] linux Python 3.6.4 cwd: /opt/****
15:56:14 
[gw24] linux Python 3.6.4 cwd: /opt/****
15:56:14 
[gw25] linux Python 3.6.4 cwd: /opt/****
15:56:14 
[gw26] linux Python 3.6.4 cwd: /opt/****
15:56:14 
[gw27] linux Python 3.6.4 cwd: /opt/****
15:56:15 
[gw28] linux Python 3.6.4 cwd: /opt/****
15:56:15 
[gw29] linux Python 3.6.4 cwd: /opt/****
15:56:15 
[gw30] linux Python 3.6.4 cwd: /opt/****
15:56:15 
[gw31] linux Python 3.6.4 cwd: /opt/****
15:56:15 
[gw32] linux Python 3.6.4 cwd: /opt/****
15:56:16 
[gw33] linux Python 3.6.4 cwd: /opt/****
15:56:17 
[gw34] linux Python 3.6.4 cwd: /opt/****
15:56:17 
[gw35] linux Python 3.6.4 cwd: /opt/****
15:56:17 
[gw36] linux Python 3.6.4 cwd: /opt/****
15:56:18 
[gw37] linux Python 3.6.4 cwd: /opt/****
15:56:18 
[gw38] linux Python 3.6.4 cwd: /opt/****
15:56:19 
[gw39] linux Python 3.6.4 cwd: /opt/****
15:56:19 
[gw0] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw1] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw2] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw3] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw4] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw5] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw6] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw8] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw7] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw9] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw10] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw11] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw12] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw13] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw14] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw15] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw17] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw16] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw18] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw19] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw20] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw21] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw23] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw22] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw24] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw25] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw27] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw26] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw28] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw29] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw30] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw31] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw32] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw33] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw34] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw35] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw36] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw37] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:19 
[gw38] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
15:56:20 
[gw39] Python 3.6.4 (default, Mar  9 2018, 23:26:36)  -- [GCC 5.4.0 20160609]
17:09:46 Cancelling nested steps due to timeout
17:09:46 Sending interrupt signal to process
17:09:49 Makefile:240: recipe for target 'test_integration2' failed
17:09:49 make: *** [test_integration2] Terminated
17:09:49 Terminated
[Pipeline] }
[Pipeline] // timeout
[Pipeline] sh
17:09:49 [dai-tests-2gpu_dev-NUMSKB2Q3XX56UBUAOGHD33FZR7UOQYGSSWMHKM3RXLMS6ITB5GQ] Running shell script
17:09:49 + echo Jenkins job ABORTED due to exception, non-zero errorcode, or Jenkins timeout: script returned exit code 143
[Pipeline] sh
17:09:49 [tests-2gpu_dev-NUMSKB2Q3XX56UBUAOGHD33FZR7UOQYGSSWMHKM3RXLMS6ITB5GQ] Running shell script
17:09:49 + mkdir -p build/test-reports
17:09:49 + echo <?xml version="1.0" encoding="utf-8"?>
17:09:49 <testsuite errors="0" failures="1" name="pytest" skips="0" tests="0" time="0">
17:09:49 <testcase classname="tests" file="tests" name="tests" time="0"><failure message="JOB WAS KILLED BECAUSE Exception">script returned exit code 143</failure></testcase>
17:09:49 </testsuite>

enkins  22406  0.0  0.0   1148     4 ?        Ss   01:30   0:00      |   \_ /dev/init -- ./run.sh /opt/h2oai make test
jenkins   2536  0.0  0.0  19716  3156 ?        S    01:32   0:00      |           \_ /bin/bash -c pytest --color=yes --durations=10 -s -v --fulltrace --full-trace --junit-xml=build/test-reports/unit-test_test_models.xml -n auto tests/test_models 2> ./tmp/test_models.29966_2018.03.05-09:32:12
jenkins   2537  0.0  0.0 3063628 29844 ?       Sl   01:32   0:01      |               \_ /h2oai_env/bin/python3.6 /env/bin/pytest --color=yes --durations=10 -s -v --fulltrace --full-trace --junit-xml=build/test-reports/unit-test_test_models.xml -n auto tests/test_models
jenkins   2541  0.2  0.1 3606224 161340 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2544  0.2  0.1 3604924 159784 ?      Sl   01:32   0:10      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2547  0.1  0.1 3604840 159948 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2551  0.1  0.1 3604924 159984 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2555  0.1  0.1 3604864 160076 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2559  0.1  0.1 3604928 159904 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2567  0.1  0.1 3604928 159876 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2574  0.1  0.1 3604928 160152 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2580  0.2  0.1 3604852 160100 ?      Sl   01:32   0:10      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2636  0.1  0.1 3604928 159868 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2669  0.1  0.1 3604832 160048 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2735  0.1  0.1 3604844 159968 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2780  0.1  0.1 3651120 160128 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2825  0.1  0.1 3604872 159976 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2871  0.1  0.1 3604832 160176 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2914  0.1  0.1 3604840 159736 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   2959  0.1  0.1 3604848 159996 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3006  0.1  0.1 3604920 159820 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3012  0.1  0.1 3604852 159868 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3059  0.1  0.1 3604852 160052 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3104  0.1  0.1 3604940 160652 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3186  0.1  0.1 3604836 160056 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3284  0.1  0.1 3604824 160316 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3472  0.1  0.1 3604876 160136 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3494  0.1  0.1 3604852 159680 ?      Sl   01:32   0:05      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3623  0.1  0.0 3382212 122060 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   7939  0.0  0.0 3382232 75236 ?       Sl   01:32   0:00      |                   |   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   7962  0.0  0.0 3388608 76856 ?       S    01:32   0:00      |                   |       \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   3890  0.1  0.1 3604884 160320 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4064  0.1  0.1 3604936 160388 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4162  0.1  0.1 3604884 160260 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4253  0.1  0.1 3604840 160472 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4485  0.1  0.1 3604840 159932 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4879  0.1  0.1 3604868 160344 ?      Sl   01:32   0:06      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   4914  0.1  0.1 3604836 160148 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   5151  0.1  0.1 3604836 160108 ?      Sl   01:32   0:09      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   5505  0.1  0.1 3604928 160212 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   5728  0.1  0.1 3604924 160408 ?      Sl   01:32   0:08      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   6014  0.1  0.1 3604964 160328 ?      Sl   01:32   0:05      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   6205  0.1  0.1 3604836 160404 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   6330  0.1  0.1 3604928 160224 ?      Sl   01:32   0:07      |                   \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins   6497  0.1  0.1 3604848 159980 ?      Sl   01:32   0:09      |                   \_ /env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))

s1113950 · 2018-08-01T23:27:46Z

We see a similar issue here:

py2-e2e runtests: commands[0] | py.test -vv -n 8 tests/e2e
============================= test session starts ==============================
platform linux2 -- Python 2.7.12, pytest-3.7.0, py-1.5.4, pluggy-0.7.1 -- /usr/share/mlt/.venv/bin/python2
cachedir: .pytest_cache
rootdir: /usr/share/mlt, inifile: tox.ini
plugins: xdist-1.22.5, forked-0.2, cov-2.5.1
[gw0] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw1] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw2] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw3] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw4] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw5] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw6] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw7] linux2 Python 2.7.12 cwd: /usr/share/mlt               
[gw0] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw2] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw1] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw4] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw3] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw7] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw5] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]
[gw6] Python 2.7.12 (default, Dec  4 2017, 14:50:18)  -- [GCC 5.4.0 20160609]   
gw0 [20] / gw1 [20] / gw2 [20] / gw3 [20] / gw4 [20] / gw5 [20] / gw6 [20] / gw7 [20]
scheduling tests via LoadScheduling

tests/e2e/test_config_updates.py::TestConfig::test_add_remove_config 
tests/e2e/test_config_updates.py::TestConfig::test_update_config 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[hello-world] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[experiments] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch-distributed] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-distributed] 
tests/e2e/test_config_updates.py::TestConfig::test_config_list 
[gw6] [  5%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_config_list 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-dist-mnist] 
[gw4] [ 10%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_add_remove_config 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[horovod] 
[gw3] [ 15%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_update_config 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tensorboard] 
[gw1] [ 20%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[hello-world] 
[gw5] [ 25%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-distributed] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_enable_sync 
[gw3] [ 30%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tensorboard] 
[gw5] [ 35%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_enable_sync 
[gw7] [ 40%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch] 
[gw2] [ 45%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[experiments] 
[gw0] [ 50%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch-distributed] 
[gw4] [ 55%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[horovod] 
[gw6] [ 60%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-dist-mnist] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_debug_wrapper 
tests/e2e/test_templates.py::TestTemplates::test_templates 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[hello-world] 
tests/e2e/test_templates.py::TestTemplates::test_local_templates 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[tf-distributed] 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_check_logs 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_no_push_deploy 
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_watch_build_and_deploy_no_push 
[gw3] [ 65%] PASSED tests/e2e/test_templates.py::TestTemplates::test_templates 
[gw0] [ 70%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[tf-distributed] 
[gw7] [ 75%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_check_logs 
[gw5] [ 80%] PASSED tests/e2e/test_templates.py::TestTemplates::test_local_templates 
[gw2] [ 85%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[hello-world] 
[gw6] [ 90%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_watch_build_and_deploy_no_push 
[gw1] [ 95%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_no_push_deploy 
[gw4] [100%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_debug_wrapper Makefile:177: recipe for target 'test-e2e-all-circleci' failed
make: *** [test-e2e-all-circleci] Terminated
Too long with no output (exceeded 20m0s)

Where tests seem to finish but then still hang. In this specific case, python2 tests seemed to finish but hung and then python3 tests never started (we use tox to trigger tests).

BlackHobbiT · 2020-01-30T02:04:28Z

Yeah, suffered from this issue sometimes. --fulltrace provide lock here
platform linux -- Python 3.6.1, pytest-5.0.1, py-1.5.3, pluggy-0.12.0 -- /usr/bin/python3.6 cachedir: .pytest_cache rootdir: /home/BlackHobbiT/path/to/test inifile: pytest.ini plugins: forked-1.0.2, xdist-1.29.0, allure-pytest-2.7.0 [gw0] linux Python 3.6.1 cwd: /home/BlackHobbiT/path/to/test [gw0] Python 3.6.1 (default, Sep 7 2017, 16:36:03) -- [GCC 6.3.0 20170406]


config = <_pytest.config.Config object at 0x7f1cac7e4da0>, doit = <function _main at 0x7f1caccb0f28>

    def wrap_session(config, doit):
        """Skeleton command line program"""
        session = Session(config)
        session.exitstatus = ExitCode.OK
        initstate = 0
        try:
            try:
                config._do_configure()
                initstate = 1
                config.hook.pytest_sessionstart(session=session)
                initstate = 2
>               session.exitstatus = doit(config, session) or 0

/usr/local/lib/python3.6/dist-packages/_pytest/main.py:213: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

config = <_pytest.config.Config object at 0x7f1cac7e4da0>, session = <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>

    def _main(config, session):
        """ default command line protocol for initialization, session,
        running tests and reporting. """
        config.hook.pytest_collection(session=session)
>       config.hook.pytest_runtestloop(session=session)

/usr/local/lib/python3.6/dist-packages/_pytest/main.py:257: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_HookCaller 'pytest_runtestloop'>, args = (), kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}
notincall = set()

    def __call__(self, *args, **kwargs):
        if args:
            raise TypeError("hook calling supports only keyword arguments")
        assert not self.is_historic()
        if self.spec and self.spec.argnames:
            notincall = (
                set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
            )
            if notincall:
                warnings.warn(
                    "Argument(s) {} which are declared in the hookspec "
                    "can not be found in this hook call".format(tuple(notincall)),
                    stacklevel=2,
                )
>       return self._hookexec(self, self.get_hookimpls(), kwargs)

/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py:289: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x7f1caee2ab00>, hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/usr/local/lib/python3.6/dist-packages/_pytest/main...1cab5f7f98>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f1cab5ce2b0>>]
kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}

    def _hookexec(self, hook, methods, kwargs):
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook, methods, kwargs)

/usr/local/lib/python3.6/dist-packages/pluggy/manager.py:87: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/usr/local/lib/python3.6/dist-packages/_pytest/main...1cab5f7f98>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f1cab5ce2b0>>]
kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}

    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
        methods,
        kwargs,
>       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
    )

/usr/local/lib/python3.6/dist-packages/pluggy/manager.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f1cab5f7f98>

    def pytest_runtestloop(self):
        self.sched = self.config.hook.pytest_xdist_make_scheduler(
            config=self.config, log=self.log
        )
        assert self.sched is not None
    
        self.shouldstop = False
        while not self.session_finished:
>           self.loop_once()

/usr/local/lib/python3.6/dist-packages/xdist/dsession.py:115: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f1cab5f7f98>

    def loop_once(self):
        """Process one callback from one of the workers."""
        while 1:
            if not self._active_nodes:
                # If everything has died stop looping
                self.triggershutdown()
                raise RuntimeError("Unexpectedly no active workers available")
            try:
>               eventcall = self.queue.get(timeout=2.0)

/usr/local/lib/python3.6/dist-packages/xdist/dsession.py:129: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <queue.Queue object at 0x7f1cab5f76d8>, block = True, timeout = 2.0

    def get(self, block=True, timeout=None):
        '''Remove and return an item from the queue.
    
        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until an item is available. If 'timeout' is
        a non-negative number, it blocks at most 'timeout' seconds and raises
        the Empty exception if no item was available within that time.
        Otherwise ('block' is false), return an item if one is immediately
        available, else raise the Empty exception ('timeout' is ignored
        in that case).
        '''
        with self.not_empty:
            if not block:
                if not self._qsize():
                    raise Empty
            elif timeout is None:
                while not self._qsize():
                    self.not_empty.wait()
            elif timeout < 0:
                raise ValueError("'timeout' must be a non-negative number")
            else:
                endtime = time() + timeout
                while not self._qsize():
                    remaining = endtime - time()
                    if remaining <= 0.0:
                        raise Empty
>                   self.not_empty.wait(remaining)

/usr/lib/python3.6/queue.py:173: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Condition(<unlocked _thread.lock object at 0x7f1cab4f2120>, 0)>, timeout = 1.9999943980947137

    def wait(self, timeout=None):
        """Wait until notified or until a timeout occurs.
    
        If the calling thread has not acquired the lock when this method is
        called, a RuntimeError is raised.
    
        This method releases the underlying lock, and then blocks until it is
        awakened by a notify() or notify_all() call for the same condition
        variable in another thread, or until the optional timeout occurs. Once
        awakened or timed out, it re-acquires the lock and returns.
    
        When the timeout argument is present and not None, it should be a
        floating point number specifying a timeout for the operation in seconds
        (or fractions thereof).
    
        When the underlying lock is an RLock, it is not released using its
        release() method, since this may not actually unlock the lock when it
        was acquired multiple times recursively. Instead, an internal interface
        of the RLock class is used, which really unlocks it even when it has
        been recursively acquired several times. Another internal interface is
        then used to restore the recursion level when the lock is reacquired.
    
        """
        if not self._is_owned():
            raise RuntimeError("cannot wait on un-acquired lock")
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        saved_state = self._release_save()
        gotit = False
        try:    # restore state no matter what (e.g., KeyboardInterrupt)
            if timeout is None:
                waiter.acquire()
                gotit = True
            else:
                if timeout > 0:
>                   gotit = waiter.acquire(True, timeout)
E                   KeyboardInterrupt

/usr/lib/python3.6/threading.py:299: KeyboardInterrupt

mavo123 · 2020-11-13T12:00:29Z

Try to use pytest-timeout, pytest --timeout=, this will kill existing hang thread and move the execution for you.

tamaskakuszi · 2021-02-22T12:20:24Z

Hey @telles-simbiose and @BlackHobbiT,

Did you manage to make it work? We suffer from the same issue. The test run hangs on 93%. All the workers are busy, only killing the process in task manager solves it, meaning let the run continue. When that worker is crushed, the report is also lost from that specific test.

Thanks

vishakha-vonage · 2021-05-13T14:15:22Z

I'm facing a very weird issue.
When I run my test in parallel-
Only 1 browser opens even when I execute pytest -n 2
After first test is successful, pytest just hangs and it does not even timeout
Has anyone faced this issue?

py.test -vv -n 2
=================================================================================================== test session starts ===================================================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.10.0, pluggy-0.13.1 --
///*
cachedir: .pytest_cache
rootdir: //*, inifile: pytest.ini
plugins: xdist-1.31.0, allure-pytest-2.8.40, repeat-0.9.1, forked-1.1.3, timeout-1.4.2
[gw0] darwin Python 3.7.3 cwd:
//*
[gw1] darwin Python 3.7.3 cwd:
/**/*
collected 8 items
[gw0] Python 3.7.3 (default, Apr 24 2020, 18:51:23) -- [Clang 11.0.3 (clang-1103.0.32.62)]
[gw1] Python 3.7.3 (default, Apr 24 2020, 18:51:23) -- [Clang 11.0.3 (clang-1103.0.32.62)]
gw0 [8] / gw1 [8]
scheduling tests via MarkBasedScheduler

test1
[gw1] [ 12%] PASSED test1

It will get stuck after this.

Ensure that individual tests never block over 1800 seconds, this also helps to avoid locking up in pytest-xdist parallel testing mode: pytest-dev/pytest-xdist#110

BlackHobbiT · 2022-01-04T09:24:26Z

@tamaskakuszi as far I remember, wiping pycache dirs sometimes helps.

JacobCallahan · 2022-12-20T18:39:39Z

@RonnyPfannschmidt we're also seeing this intermittently. If you'd like access to one of our environments, I can make that happen. Thanks!

RonnyPfannschmidt · 2022-12-20T18:43:52Z

@JacobCallahan shot me more details at the work channel

Bruniz · 2023-02-20T12:35:06Z

Any updated about this topic @RonnyPfannschmidt @JacobCallahan?

We are facing this in the CI of the company I work. We use xdist to run the tests in parallel and it seems this happens from time to time when fail fast is enabled and the test session is aborted. We then have a zombie process that is stuck running this command https://github.com/pytest-dev/execnet/blame/d7ca9815734a4efb168c3ef997858e38c040fc70/execnet/gateway_io.py#L58 as far as I can tell. It would make sense as we are using xdist.

I don't really understand what this line is supposed to do but it looks like some old workaround possibly?

I could also create an issue in execnet as well if that is of any use.

RonnyPfannschmidt · 2023-02-20T12:41:31Z

this line bootstraps execnet, the rest is fed as commend in stdio

Bruniz · 2023-02-20T12:56:47Z

Okay, any idea why processes can be left hanging on that command? There is this command seen in pstree and then it's waiting to read a file descriptor, I guess stdin, but nothing is being written there by any process.

RonnyPfannschmidt · 2023-02-20T13:25:06Z

That seems like the control process died and the worker is waiting for the shutdown command, fetching a stack trace with gdb is only partially helpful as the io is being handled multithreaded, and the state of the worker is unclear

Bruniz · 2023-02-20T14:42:12Z

Thanks for your time Ronny, I tried digging around with gdb but I only basically found a reference back to the code I mentioned. The reset of the trace was in C so it went a bit over my head.
Is the interrupt from pytest then possibly not handled correctly. We only use the built in fail fast in pytest and don't kill anything extra.

RonnyPfannschmidt · 2023-02-20T14:45:05Z

@Bruniz thats unclear, its entirely possible our suite is hanging somewhere in c and the shut-down isn't reaching it
with the information provided i'm unable to make a educated guess

Bruniz · 2023-02-21T07:43:26Z

Thanks for your time Ronny, I tried digging around with gdb but I only basically found a reference back to the code I mentioned. The reset of the trace was in C so it went a bit over my head.
Is the interrupt from pytest then possibly not handled correctly. We only use the built in fail fast in pytest and don't kill anything extra.

RonnyPfannschmidt · 2023-02-21T08:19:31Z

@Bruniz by fail fast, do you mean -x ?

having a paste of command + command output would be a big help

webbnh · 2023-07-29T03:21:24Z

I'm seeing this problem in our CI now, too.

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jenkins        1  0.0  0.0   1000     4 ?        Ss   02:32   0:00 /run/podman-init -- ./build.sh
jenkins        7  0.0  0.0   4072  3272 ?        S    02:32   0:00 /bin/bash -e ./build.sh
jenkins      227  1.6  0.0 850404 36284 ?        Sl   02:34   0:31 /usr/bin/python3 -P /usr/bin/tox
jenkins      387  0.3  0.0  86652 79016 ?        S    02:35   0:05 /var/tmp/jenkins/tox/.pkg/bin/python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
jenkins      450  0.0  0.0   4204  3320 ?        S    02:35   0:00 /bin/bash /home/jenkins/pbench/exec-tests /var/tmp/jenkins/tox/py39
jenkins      560  0.0  0.0   2332   916 ?        S    02:35   0:00 /usr/bin/time --format=\n\nCommand: '%C'\nExit status: %x\nTimings: user %Us, system %Ss, elapsed %es (%E, %P)\nMemory: max RSS %Mk, minor pf: %R, major pf: %F, swaps %W\nContext switches: inv %c, vol %w, signals %k\nI/O: fs in %I, fs out %O, socket in %r, socket out %s\n pytest --tb=native -n auto --basetemp=/var/tmp/jenkins/tox/py39/tmp --cov=/var/tmp/jenkins/tox/py39/lib/python3.9/site-packages/pbench --cov-report xml:cov/report.xml -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
jenkins      561  2.2  0.0 2587628 241432 ?      Sl   02:35   0:41 /var/tmp/jenkins/tox/py39/bin/python /var/tmp/jenkins/tox/py39/bin/pytest --tb=native -n auto --basetemp=/var/tmp/jenkins/tox/py39/tmp --cov=/var/tmp/jenkins/tox/py39/lib/python3.9/site-packages/pbench --cov-report xml:cov/report.xml -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
jenkins      567  0.5  0.0 307108 149232 ?       Sl   02:35   0:09 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      570  0.4  0.0 299348 140964 ?       Sl   02:35   0:07 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      573  0.5  0.0 308988 150952 ?       Sl   02:35   0:10 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      576  0.4  0.0 302416 144392 ?       Sl   02:35   0:07 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      579  0.5  0.0 311952 153348 ?       Sl   02:35   0:10 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      582  0.5  0.0 305180 149068 ?       Sl   02:35   0:09 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins      585  0.6  0.0 313972 155868 ?       Sl   02:35   0:11 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
 [and 25 more worker processes in similar states]

What other information can I provide?

$ /var/tmp/jenkins/tox/py39/bin/python --version
Python 3.9.16
$ /var/tmp/jenkins/tox/py39/bin/pytest --version
pytest 7.4.0

Here's the log leading up to the hang:

py39: install_deps> pip install --cache-dir=/var/tmp/jenkins/tox/cache --progress-bar off --prefix=/var/tmp/jenkins/tox/py39 -r /home/jenkins/pbench/agent/requirements.txt -r /home/jenkins/pbench/agent/test-requirements.txt -r /home/jenkins/pbench/client/requirements.txt -r /home/jenkins/pbench/server/requirements.txt -r /home/jenkins/pbench/server/test-requirements.txt
.pkg: install_requires> python -I -m pip install 'setuptools>=46.1.3' 'wheel>=0.34.2'
.pkg: _optional_hooks> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: get_requires_for_build_sdist> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: install_requires_for_build_sdist> python -I -m pip install pbr
.pkg: prepare_metadata_for_build_wheel> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: build_sdist> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
py39: install_package> pip install --cache-dir=/var/tmp/jenkins/tox/cache --progress-bar off --prefix=/var/tmp/jenkins/tox/py39 --force-reinstall --no-deps /var/tmp/jenkins/tox/.tmp/package/1/pbench-0.0.1.dev2674.tar.gz
py39: commands[0]> bash -c '/home/jenkins/pbench/exec-tests /var/tmp/jenkins/tox/py39 '



Running agent,client,server python3-based unit tests via pytest

Package                  Version
------------------------ -------------
alembic                  1.11.1
aniso8601                9.0.1
appier                   1.21.7
bcrypt                   4.0.1
Bcrypt-Flask             1.0.2
beautifulsoup4           4.9.3
blinker                  1.6.2
boto3                    1.17.97
botocore                 1.20.97
bottle                   0.12.25
bs4                      0.0.1
cachetools               4.2.2
certifi                  2021.5.30
cffi                     1.15.1
chardet                  4.0.0
click                    8.1.6
coverage                 7.2.7
cryptography             41.0.2
docutils                 0.20.1
elasticsearch            7.13.4
exceptiongroup           1.1.2
execnet                  2.0.2
filelock                 3.12.2
Flask                    2.3.2
Flask-Cors               4.0.0
Flask-HTTPAuth           4.8.0
Flask-JWT-Extended       4.5.2
Flask-Migrate            4.0.4
Flask-RESTful            0.3.10
Flask-SQLAlchemy         3.0.5
freezegun                1.2.2
gitdb                    4.0.10
GitPython                3.1.32
google-api               0.1.12
google-api-core          1.30.0
google-api-python-client 2.9.0
google-auth              1.31.0
google-auth-httplib2     0.1.0
google-auth-oauthlib     0.4.4
googleapis-common-protos 1.53.0
greenlet                 2.0.2
gunicorn                 21.2.0
hiredis                  2.2.3
httplib2                 0.19.1
humanize                 4.7.0
idna                     2.10
ifaddr                   0.2.0
importlib-metadata       6.8.0
iniconfig                2.0.0
itsdangerous             2.1.2
Jinja2                   3.1.2
jmespath                 0.10.0
lockfile                 0.12.2
Mako                     1.2.4
MarkupSafe               2.1.3
mock                     5.1.0
oauthlib                 3.1.1
packaging                20.9
pbench                   0.0.1.dev2674
pip                      22.3.1
pluggy                   1.2.0
pquisby                  0.0.12
protobuf                 3.17.3
psutil                   5.9.5
psycopg2                 2.9.6
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycparser                2.21
pyesbulk                 2.1.1
PyJWT                    2.8.0
pyparsing                2.4.7
pytest                   7.4.0
pytest-cov               4.1.0
pytest-dependency        0.5.1
pytest-freezegun         0.4.2
pytest-helpers-namespace 2019.1.8
pytest-mock              3.11.1
pytest-xdist             3.3.1
python-daemon            3.0.1
python-dateutil          2.8.1
python-pidfile           3.1.1
pytz                     2019.1
PyYAML                   6.0.1
redis                    3.5.3
requests                 2.25.1
requests-mock            1.11.0
requests-oauthlib        1.3.0
responses                0.23.1
rsa                      4.7.2
s3transfer               0.4.2
sdnotify                 0.3.2
setuptools               65.5.1
sh                       2.0.4
six                      1.16.0
smmap                    5.0.0
soupsieve                2.2.1
SQLAlchemy               2.0.19
SQLAlchemy-Utils         0.41.1
state-signals            1.0.1
tomli                    2.0.1
types-PyYAML             6.0.12.11
typing_extensions        4.7.1
uritemplate              3.0.1
urllib3                  1.26.5
Werkzeug                 2.3.6
wheel                    0.38.4
zipp                     3.16.2
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.4.0, pluggy-1.2.0
cachedir: /var/tmp/jenkins/tox/py39/.pytest_cache
rootdir: /home/jenkins/pbench
configfile: pytest.ini
plugins: cov-4.1.0, freezegun-0.4.2, helpers-namespace-2019.1.8, requests-mock-1.11.0, mock-3.11.1, xdist-3.3.1, dependency-0.5.1
created: 32/32 workers
32 workers [1521 items]

[and nothing more...]

webbnh · 2023-07-29T04:40:18Z

Trying the tests in my development environment, they hang too. Since I'm running interactively, there, I get slightly more output from Pytest:

=============== test session starts ================
platform linux -- Python 3.9.16, pytest-7.4.0, pluggy-1.2.0
cachedir: /var/tmp/wscales/tox/py39/.pytest_cache
rootdir: /home/wscales/pbench
configfile: pytest.ini
plugins: requests-mock-1.11.0, xdist-3.3.1, mock-3.11.1, helpers-namespace-2019.1.8, freezegun-0.4.2, dependency-0.5.1, cov-4.1.0
8 workers [1521 items]  
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ss....s.s.....s.........s..........................s..s...s.....s...s.s............................................................................................................................s..s..s...s......ss..s..s..s......s.s......ss.sss....s......s.....................................................

The ps output looks pretty much the same (only there are 8 workers instead of 32):

$ ps -auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
wscales        1  0.0  0.0   1084     4 pts/0    Ss   04:32   0:00 /run/podman-init -- tox
wscales        2  2.4  0.0 776700 38664 pts/0    Sl+  04:32   0:04 /usr/bin/python3 -P /usr/bin/tox
wscales      162  4.4  0.1  88460 81804 pts/0    S+   04:32   0:06 /var/tmp/wscales/tox/.pkg/bin/python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
wscales      226  0.0  0.0   4204  3244 pts/0    S+   04:32   0:00 /bin/bash /home/wscales/pbench/exec-tests /var/tmp/wscales/tox/py39
wscales      270  0.0  0.0   2332   912 pts/0    S+   04:32   0:00 /usr/bin/time --format=\n\nCommand: '%C'\nExit status: %x\nTimings: user %Us, system %Ss, elapsed %es (%E, %P)\nMemory: max RSS %Mk, minor pf: %R, major pf: %F, swaps %W\nContext switches: inv %c, vol %w, signals %k\nI/O: fs in %I, fs out %O, socket in %r, socket out %s\n pytest --tb=native -n auto --basetemp=/var/tmp/wscales/tox/py39/tmp --cov=/var/tmp/wscales/tox/py39/lib/python3.9/site-packages/pbench --cov-report html:/var/tmp/wscales/tox/py39/cov/html -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
wscales      271  7.8  0.1 698088 107188 pts/0   Sl+  04:32   0:09 /var/tmp/wscales/tox/py39/bin/python /var/tmp/wscales/tox/py39/bin/pytest --tb=native -n auto --basetemp=/var/tmp/wscales/tox/py39/tmp --cov=/var/tmp/wscales/tox/py39/lib/python3.9/site-packages/pbench --cov-report html:/var/tmp/wscales/tox/py39/cov/html -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
wscales      276 15.3  0.2 322884 164796 pts/0   Sl+  04:32   0:18 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      280  5.6  0.1 215148 126652 pts/0   Sl+  04:32   0:06 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      284 15.6  0.3 355472 196948 pts/0   Sl+  04:32   0:18 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      288 14.8  0.3 359424 197008 pts/0   Sl+  04:32   0:17 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      292 13.5  0.2 324888 166400 pts/0   Sl+  04:32   0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      296 13.3  0.2 326292 167844 pts/0   Sl+  04:32   0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      300 13.5  0.2 319792 161728 pts/0   Sl+  04:32   0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales      304 11.6  0.2 341464 180100 pts/0   Sl+  04:32   0:14 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))

webbnh · 2023-07-29T06:23:40Z

Nevermind!...

It turned out that one parametrize'd scenario of one test was sitting in an infinite loop, and that caused the test run to hang a few tests short of the end. Once I found and addressed the problem, the tests no longer hang.

That's great, but was there some way that I should have been able to find this more easily?

RonnyPfannschmidt · 2023-07-29T06:28:26Z

The timeout Plugin tends to be a great Help for hangups

nicoddemus · 2023-08-02T12:49:19Z

Anybody wants to contribute a change to the docs mentioning pytest-timeout? That would be excellent.

webbnh · 2023-08-02T15:01:00Z

Anybody wants to contribute a change to the docs mentioning pytest-timeout? That would be excellent.

😁 Indeed, especially if it points out the fact that it is hard to figure out which test is hanging without it!

RonnyPfannschmidt · 2023-08-02T15:16:23Z

@nicoddemus I'm wondering if xdist should ensure to identify all currently running tests and their phases whenever a node exceeds a predetermined timeframe

An even further expansion of this might be printing stacktraces

nicoddemus · 2023-08-02T20:08:30Z

Sounds good @RonnyPfannschmidt, indeed it makes sense for a new option to at least warn the user if a test has been running for X seconds (configurable, perhaps with a reasonable default of say 120s). However I would leave the job of cancelling long running test to pytest-timeout.

RonnyPfannschmidt · 2023-08-02T21:02:16Z

Indeed, debugging Print is fine but the terminate gun ought to be opt in

F39 pip isn't affected, but on rpm the tests stuck till the OOM Killer triggers. There are multiple similar reports upstream but no fix. To unblock the unit tests, F39 RPM will be skipped for now. F39 pip covers py39, py310, py311 and py312. CI jobs that run into the issue: https://jenkins-pagure.apps.ocp.cloud.ci.centos.org/job/pull-requests/276/ https://jenkins-pagure.apps.ocp.cloud.ci.centos.org/job/pull-requests/277/ GitHub issues that report similar issues: pytest-dev/pytest-xdist#110 pytest-dev/pytest-xdist#661 pytest-dev/pytest-xdist#872 pytest-dev/pytest-xdist#1005

jcampbell05 · 2025-02-10T16:43:53Z

We had this also and it seems that setting n=logical can consume too many threads and in some cases lead to xdist waiting for the program being tested to complete however that program is waiting to get a thread but the threadpool is exhausted by xdist. We fixed it by setting xdist to a number of threads lower than the maximum cpu core, which makes the tests stable again.

It would be great if you could set a negative flag i.e "-n=-1" which could mean "logical - 1 core" to allow as many cores to be used whilst minimising deadlocking risk

JP-Ellis mentioned this issue Dec 2, 2019

Update supported Python version to 3.5, 3.6, 3.7 & 3.8 papis/papis#244

Merged

SHxKM mentioned this issue Sep 16, 2020

pytest gets really slow at around the 95% mark #600

Closed

vnghia mentioned this issue May 9, 2021

Use xdist with multi-processing tensorflow/io#1410

Merged

jcampbell05 mentioned this issue Feb 10, 2025

Constantly hanging test run with the plugin #872

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytest hangs while running tests #110

pytest hangs while running tests #110

telles-simbiose commented Jan 11, 2017 •

edited

Loading

RonnyPfannschmidt commented Jan 11, 2017

Daenyth commented Jan 12, 2017

telles-simbiose commented Jan 13, 2017

RonnyPfannschmidt commented Jan 13, 2017

pseudotensor commented Mar 10, 2018 •

edited

Loading

s1113950 commented Aug 1, 2018

BlackHobbiT commented Jan 30, 2020 •

edited

Loading

mavo123 commented Nov 13, 2020

tamaskakuszi commented Feb 22, 2021

vishakha-vonage commented May 13, 2021

BlackHobbiT commented Jan 4, 2022

JacobCallahan commented Dec 20, 2022

RonnyPfannschmidt commented Dec 20, 2022

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 21, 2023

RonnyPfannschmidt commented Feb 21, 2023

webbnh commented Jul 29, 2023

webbnh commented Jul 29, 2023

webbnh commented Jul 29, 2023

RonnyPfannschmidt commented Jul 29, 2023

nicoddemus commented Aug 2, 2023

webbnh commented Aug 2, 2023

RonnyPfannschmidt commented Aug 2, 2023

nicoddemus commented Aug 2, 2023

RonnyPfannschmidt commented Aug 2, 2023

jcampbell05 commented Feb 10, 2025 •

edited

Loading

pytest hangs while running tests #110

pytest hangs while running tests #110

Comments

telles-simbiose commented Jan 11, 2017 • edited Loading

RonnyPfannschmidt commented Jan 11, 2017

Daenyth commented Jan 12, 2017

telles-simbiose commented Jan 13, 2017

RonnyPfannschmidt commented Jan 13, 2017

pseudotensor commented Mar 10, 2018 • edited Loading

s1113950 commented Aug 1, 2018

BlackHobbiT commented Jan 30, 2020 • edited Loading

mavo123 commented Nov 13, 2020

tamaskakuszi commented Feb 22, 2021

vishakha-vonage commented May 13, 2021

BlackHobbiT commented Jan 4, 2022

JacobCallahan commented Dec 20, 2022

RonnyPfannschmidt commented Dec 20, 2022

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 20, 2023

RonnyPfannschmidt commented Feb 20, 2023

Bruniz commented Feb 21, 2023

RonnyPfannschmidt commented Feb 21, 2023

webbnh commented Jul 29, 2023

webbnh commented Jul 29, 2023

webbnh commented Jul 29, 2023

RonnyPfannschmidt commented Jul 29, 2023

nicoddemus commented Aug 2, 2023

webbnh commented Aug 2, 2023

RonnyPfannschmidt commented Aug 2, 2023

nicoddemus commented Aug 2, 2023

RonnyPfannschmidt commented Aug 2, 2023

jcampbell05 commented Feb 10, 2025 • edited Loading

telles-simbiose commented Jan 11, 2017 •

edited

Loading

pseudotensor commented Mar 10, 2018 •

edited

Loading

BlackHobbiT commented Jan 30, 2020 •

edited

Loading

jcampbell05 commented Feb 10, 2025 •

edited

Loading