Overhead of pytest-xdist is often greater than time saved for parallelization #44

alex · 2016-02-01T04:48:24Z

(I'm pretty sure this is a known issue, but I didn't see an open issue for it, and wanted there to be a canonical place to track it)

Example:

(cryptography) ~/p/cryptography (master) $ py.test tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
collected 492 items

tests/test_x509.py .........................................................................................................................................................................................
tests/test_x509_crlbuilder.py ............................
tests/test_x509_ext.py .......................................................................................................................................................................................................................................................................
tests/test_x509_revokedcertbuilder.py ................

=================================================== 492 passed in 3.60 seconds ===================================================
(cryptography) ~/p/cryptography (master) $ py.test -n 2 tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
gw0 [492] / gw1 [492]
scheduling tests via LoadScheduling
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
=================================================== 492 passed in 5.00 seconds ===================================================
(cryptography) ~/p/cryptography (master) $ py.test -n 2 tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
gw0 [492] / gw1 [492]
scheduling tests via LoadScheduling
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
=================================================== 492 passed in 4.62 seconds ===================================================
(cryptography) ~/p/cryptography (master) $ py.test -n 4 tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
gw0 [492] / gw1 [492] / gw2 [492] / gw3 [492]
scheduling tests via LoadScheduling
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
=================================================== 492 passed in 6.23 seconds ===================================================
(cryptography) ~/p/cryptography (master) $ py.test -n 4 tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
gw0 [492] / gw1 [492] / gw2 [492] / gw3 [492]
scheduling tests via LoadScheduling
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
=================================================== 492 passed in 6.73 seconds ===================================================
(cryptography) ~/p/cryptography (master) $ py.test -n 4 tests/test_x509*.py
====================================================== test session starts =======================================================
platform darwin -- Python 2.7.10[pypy-4.0.1-final], pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /Users/alex_gaynor/projects/cryptography, inifile: tox.ini
plugins: hypothesis-2.0.0, xdist-1.14
gw0 [492] / gw1 [492] / gw2 [492] / gw3 [492]
scheduling tests via LoadScheduling
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
=================================================== 492 passed in 6.53 seconds ===================================================

The text was updated successfully, but these errors were encountered:

nonatomiclabs · 2016-02-01T06:16:51Z

I think that in so quick tests, it is normal because of the scheduling.
And I guess it's not really an issue as it takes almost no time anyway.

RonnyPfannschmidt · 2016-02-01T07:02:19Z

@alex we did not actually track that detail

But we do have some different solutions in mind for different reasons

For example file-level granularity at collection time, and scheduling approaches with different kinds of latency in mind

The main blocker for this is turning scheduled into a set of explicit statemachines

nicoddemus · 2016-02-14T16:01:43Z

Hmm file-level granularity at collection time seems simple to implement, and would eliminate the overhead of all nodes collecting all tests... not sure how big a boost would be for such small test suite anyway. FWIW we do see linear gains as we add CPUS when test suites start taking more than 2-3 minutes or so.

Not sure if @RonnyPfannschmidt wants to tackle this before the internal refactoring though.

vladu · 2018-03-01T21:16:55Z

Is there any work currently being done to implement the "distributed collection", as described above, to avoid every node collecting every test?

nicoddemus · 2018-03-01T21:30:41Z

@vladu I don't think anybody has taken time to work on this, but if anybody wants to start a PR we would be glad to help guide them. 👍

vladu · 2018-03-01T21:48:51Z

I might take a stab at it. I've poked around the code a little bit, and I have some ideas how this might be accomplished, but any suggestions from the experts on a clean solution would be appreciated.

RonnyPfannschmidt · 2018-03-02T06:38:59Z

@vladu currently there is not even groundwork fir this, i propose some kind of brainstorming to get a rough idea of starting points for experimentation, we might need a major internal refactoring in the beginning

nicoddemus · 2018-03-02T17:32:18Z

@RonnyPfannschmidt good idea.

One solution I have thought (borrowed from some other system which I don't recall right now) is that each worker can infer which tests should be collected based on their id and total number of workers. This can be accomplished easily today because each worker knows its own id and knows how many workers there are in total (based on PYTEST_XDIST_WORKER_COUNT env variable), so they can implement pytest_collect_ignore to ignore every path except the Nth path that is given to it (assuming all workers receive the paths in the same order).

For example, here's a list of tests in a suite and which worker collects that file (with 3 workers):

tests/test_1.py  # gw0
tests/test_2.py  # gw1
tests/test_3.py  # gw2
tests/test_4.py  # gw0
tests/test_5.py  # gw1
tests/test_6.py  # gw2
...

And so on.

With that working, the master node can then just say to all workers: "hey, run all the tests you have collected" and that's it.

This is of course a quick draft, just throwing this here to see what you guys think.

RonnyPfannschmidt · 2018-03-02T20:25:39Z

@nicoddemus thats one of the ideas i had collected under the banner "pytest-bigtest for xdist"

jdahlin · 2020-08-20T17:04:55Z

Another idea would be to move collection to the master node and have the scheduling pass in tests to the worker nodes. That would also make it possible to solve #586.

RonnyPfannschmidt · 2020-08-20T17:42:48Z

All nodes will have to recollect in any case, nodes are not designed to be network transferable

rth mentioned this issue Jan 28, 2019

CI Uses pytest-xdist to parallelize tests scikit-learn/scikit-learn#13041

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhead of pytest-xdist is often greater than time saved for parallelization #44

Overhead of pytest-xdist is often greater than time saved for parallelization #44

alex commented Feb 1, 2016

nonatomiclabs commented Feb 1, 2016

RonnyPfannschmidt commented Feb 1, 2016

nicoddemus commented Feb 14, 2016

vladu commented Mar 1, 2018

nicoddemus commented Mar 1, 2018

vladu commented Mar 1, 2018

RonnyPfannschmidt commented Mar 2, 2018

nicoddemus commented Mar 2, 2018

RonnyPfannschmidt commented Mar 2, 2018

jdahlin commented Aug 20, 2020

RonnyPfannschmidt commented Aug 20, 2020

Overhead of pytest-xdist is often greater than time saved for parallelization #44

Overhead of pytest-xdist is often greater than time saved for parallelization #44

Comments

alex commented Feb 1, 2016

nonatomiclabs commented Feb 1, 2016

RonnyPfannschmidt commented Feb 1, 2016

nicoddemus commented Feb 14, 2016

vladu commented Mar 1, 2018

nicoddemus commented Mar 1, 2018

vladu commented Mar 1, 2018

RonnyPfannschmidt commented Mar 2, 2018

nicoddemus commented Mar 2, 2018

RonnyPfannschmidt commented Mar 2, 2018

jdahlin commented Aug 20, 2020

RonnyPfannschmidt commented Aug 20, 2020