Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize clique expansion #764

Open
knaaptime opened this issue Aug 8, 2024 · 3 comments
Open

parallelize clique expansion #764

knaaptime opened this issue Aug 8, 2024 · 3 comments

Comments

@knaaptime
Copy link
Member

on relatively large graphs (n > ~10k) with colocated points, the clique expansion function can get really slow. After cutting it short a few times, I think this for-loop is where things are getting stuck. If that's the bottleneck, I think it should be pretty straightforward to parallelize using something like joblib

@ljwolf
Copy link
Member

ljwolf commented Aug 8, 2024

My original implementation was pure pandas, based on join and explode with no loops. Would it make sense to revisit that?

@knaaptime
Copy link
Member Author

yeah i was wondering about explode there. I presume that would be even faster?

@ljwolf
Copy link
Member

ljwolf commented Aug 8, 2024

I think so.

We moved away from it because (a) hashing the points to establish unique locations was expensive and (b) our policy on sorting was different, but I think we can still use pandas join and explode given the new function signature.

The earlier implementation, conceptually:

  • Built a series indexed on unique locations with values being a list of all the original indices at that location. This used a groupby on the geometries and agg(list) on the ids
  • Explode the aggregated series
  • join the exploded series back to the adjacency table, allowing join to handle the one to many expansion for edges entering/leaving cliques.
  • separately, built the all-pairs links within each clique using a join.
  • stack the within clique and beyond clique tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants