-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhead of pytest-xdist is often greater than time saved for parallelization #44
Comments
I think that in so quick tests, it is normal because of the scheduling. |
@alex we did not actually track that detail But we do have some different solutions in mind for different reasons For example file-level granularity at collection time, and scheduling approaches with different kinds of latency in mind The main blocker for this is turning scheduled into a set of explicit statemachines |
Hmm file-level granularity at collection time seems simple to implement, and would eliminate the overhead of all nodes collecting all tests... not sure how big a boost would be for such small test suite anyway. FWIW we do see linear gains as we add CPUS when test suites start taking more than 2-3 minutes or so. Not sure if @RonnyPfannschmidt wants to tackle this before the internal refactoring though. |
Is there any work currently being done to implement the "distributed collection", as described above, to avoid every node collecting every test? |
@vladu I don't think anybody has taken time to work on this, but if anybody wants to start a PR we would be glad to help guide them. 👍 |
I might take a stab at it. I've poked around the code a little bit, and I have some ideas how this might be accomplished, but any suggestions from the experts on a clean solution would be appreciated. |
@vladu currently there is not even groundwork fir this, i propose some kind of brainstorming to get a rough idea of starting points for experimentation, we might need a major internal refactoring in the beginning |
@RonnyPfannschmidt good idea. One solution I have thought (borrowed from some other system which I don't recall right now) is that each worker can infer which tests should be collected based on their id and total number of workers. This can be accomplished easily today because each worker knows its own id and knows how many workers there are in total (based on For example, here's a list of tests in a suite and which worker collects that file (with 3 workers):
And so on. With that working, the This is of course a quick draft, just throwing this here to see what you guys think. |
@nicoddemus thats one of the ideas i had collected under the banner "pytest-bigtest for xdist" |
Another idea would be to move collection to the |
All nodes will have to recollect in any case, nodes are not designed to be network transferable |
(I'm pretty sure this is a known issue, but I didn't see an open issue for it, and wanted there to be a canonical place to track it)
Example:
The text was updated successfully, but these errors were encountered: