Skip to content

Commit 15de782

Browse files
ENH: Adding and documenting configs in nx-parallel (#75)
* initial commit * minor docs updates * style fix * mv _set_nx_config to decorators.py * renamed nx_config to active * style fix * added _set_nx_config to main namespace * added F401 * Renamed _set_nx_config to _configure_if_nx_active * updated Config.md * Improved Config.md * improved Config.md * added _configure_if_nx_active to all funcs * renamed cpu_count to get_n_jobs * removing n_jobs from Parallel() because that will be configured using joblib.parallel_config or networkx config * renaming cpu_count or total_cores to n_jobs * updated README * updated docs acc to config * updated Config.md and README.md based on the review comments * improved config docs
1 parent a98224c commit 15de782

25 files changed

+441
-188
lines changed

Diff for: CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2=
113113

114114
In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process.
115115

116-
The default chunking in nx-parallel is done by first determining the number of available CPU cores and then allocating the nodes (or edges or any other iterator) per chunk by dividing the total number of nodes by the total CPU cores available. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). This default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)). Also, when [the `config` PR](https://github.com/networkx/networkx/pull/7225) is merged in networkx, and the `config` will be added to nx-parallel, then the user would be able to control the number of CPU cores they would want to use and then the chunking would be done accordingly.
116+
The default chunking in nx-parallel is done by slicing the list of nodes (or edges or any other iterator) into `n_jobs` number of chunks. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). By default, `n_jobs` is `None`. To learn about how you can modify the value of `n_jobs` and other config options refer [`Config.md`](./Config.md). The default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)).
117117

118118
## General guidelines on adding a new algorithm
119119

Diff for: Config.md

+156
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Configuring nx-parallel
2+
3+
`nx-parallel` provides flexible parallel computing capabilities, allowing you to control settings like `backend`, `n_jobs`, `verbose`, and more. This can be done through two configuration systems: `joblib` and `NetworkX`. This guide explains how to configure `nx-parallel` using both systems.
4+
5+
## 1. Setting configs using `joblib.parallel_config`
6+
7+
`nx-parallel` relies on [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html) for parallel computing. You can adjust its settings through the [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. For more details, check out the official [joblib documentation](https://joblib.readthedocs.io/en/latest/parallel.html).
8+
9+
### 1.1 Usage
10+
11+
```python
12+
from joblib import parallel_config
13+
14+
# Setting global configs
15+
parallel_config(n_jobs=3, verbose=50)
16+
nx.square_clustering(H)
17+
18+
# Setting configs in a context
19+
with parallel_config(n_jobs=7, verbose=0):
20+
nx.square_clustering(H)
21+
```
22+
23+
Please refer the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters.
24+
25+
Note: Ensure that `nx.config.backends.parallel.active = False` when using `joblib` for configuration, as NetworkX configurations will override `joblib.parallel_config` settings if `active` is `True`.
26+
27+
## 2. Setting configs using `networkx`'s configuration system for backends
28+
29+
To use NetworkX’s configuration system in `nx-parallel`, you must set the `active` flag (in `nx.config.backends.parallel`) to `True`.
30+
31+
### 2.1 Configs in NetworkX for backends
32+
33+
When you import NetworkX, it automatically sets default configurations for all installed backends, including `nx-parallel`.
34+
35+
```python
36+
import networkx as nx
37+
38+
print(nx.config)
39+
```
40+
41+
Output:
42+
43+
```
44+
NetworkXConfig(
45+
backend_priority=[],
46+
backends=Config(
47+
parallel=ParallelConfig(
48+
active=False,
49+
backend="loky",
50+
n_jobs=None,
51+
verbose=0,
52+
temp_folder=None,
53+
max_nbytes="1M",
54+
mmap_mode="r",
55+
prefer=None,
56+
require=None,
57+
inner_max_num_threads=None,
58+
backend_params={},
59+
)
60+
),
61+
cache_converted_graphs=True,
62+
)
63+
```
64+
65+
As you can see in the above output, by default, `active` is set to `False`. So, to enable NetworkX configurations for `nx-parallel`, set `active` to `True`. Please refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on networkx configuration system.
66+
67+
### 2.2 Usage
68+
69+
```python
70+
# enabling networkx's config for nx-parallel
71+
nx.config.backends.parallel.active = True
72+
73+
# Setting global configs
74+
nxp_config = nx.config.backends.parallel
75+
nxp_config.n_jobs = 3
76+
nxp_config.verbose = 50
77+
78+
nx.square_clustering(H)
79+
80+
# Setting config in a context
81+
with nxp_config(n_jobs=7, verbose=0):
82+
nx.square_clustering(H)
83+
```
84+
85+
The configuration parameters are the same as `joblib.parallel_config`, so you can refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand these config parameters.
86+
87+
### 2.3 How Does NetworkX's Configuration Work in nx-parallel?
88+
89+
In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active`(in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). If `active=True`, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function in this context. Otherwise, it simply calls the function.
90+
91+
## 3. Comparing NetworkX and Joblib Configuration Systems
92+
93+
### 3.1 Using Both Systems Simultaneously
94+
95+
You can use both NetworkX’s configuration system and `joblib.parallel_config` together in `nx-parallel`. However, it’s important to understand their interaction.
96+
97+
Example:
98+
99+
```py
100+
# Enable NetworkX configuration
101+
nx.config.backends.parallel.active = True
102+
nx.config.backends.parallel.n_jobs = 6
103+
104+
# Global Joblib configuration
105+
joblib.parallel_config(backend="threading")
106+
107+
with joblib.parallel_config(n_jobs=4, verbose=55):
108+
# NetworkX config for nx-parallel
109+
# backend="loky", n_jobs=6, verbose=0
110+
nx.square_clustering(G, backend="parallel")
111+
112+
# Joblib config for other parallel tasks
113+
# backend="threading", n_jobs=4, verbose=55
114+
joblib.Parallel()(joblib.delayed(sqrt)(i**2) for i in range(10))
115+
```
116+
117+
- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context.
118+
119+
- **Joblib Configurations for Other Code**: For any other parallel code outside of `nx-parallel`, such as a direct call to `joblib.Parallel`, the configurations specified within the Joblib context will be applied.
120+
121+
This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings when enabled, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks.
122+
123+
**Key Takeaway**: When both systems are used together, NetworkX's configuration (`nx.config.backends.parallel`) takes precedence for `nx-parallel` functions. To avoid unexpected behavior, ensure that the `active` setting aligns with your intended configuration system.
124+
125+
### 3.2 Key Differences
126+
127+
- **Parameter Handling**: The main difference is how `backend_params` are passed. Since, in networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below:
128+
129+
```py
130+
nx.config.backends.parallel.backend_params = {"max_nbytes": None}
131+
joblib.parallel_config(backend="loky", max_nbytes=None)
132+
```
133+
134+
- **Default Behavior**: By default, `nx-parallel` looks for configs in `joblib.parallel_config` unless `nx.config.backends.parallel.active` is set to `True`.
135+
136+
### 3.3 When Should You Use Which System?
137+
138+
When the only networkx backend you're using is `nx-parallel`, then either of the NetworkX or `joblib` configuration systems can be used, depending on your preference.
139+
140+
But, when working with multiple NetworkX backends, it's crucial to ensure compatibility among the backends to avoid conflicts between different configurations. In such cases, using NetworkX's configuration system to configure `nx-parallel` is recommended. This approach helps maintain consistency across backends. For example:
141+
142+
```python
143+
nx.config.backend_priority = ["another_nx_backend", "parallel"]
144+
nx.config.backends.another_nx_backend.config_1 = "xyz"
145+
joblib.parallel_config(n_jobs=7, verbose=50)
146+
147+
nx.square_clustering(G)
148+
```
149+
150+
In this example, if `another_nx_backend` also internally utilizes `joblib.Parallel` (without exposing it to the user) within its implementation of the `square_clustering` algorithm, then the `nx-parallel` configurations set by `joblib.parallel_config` will influence the internal `joblib.Parallel` used by `another_nx_backend`. To prevent unexpected behavior, it is advisable to configure these settings through the NetworkX configuration system.
151+
152+
**Future Synchronization:** We are working on synchronizing both configuration systems so that changes in one system automatically reflect in the other. This started with [PR#68](https://github.com/networkx/nx-parallel/pull/68), which introduced a unified context manager for `nx-parallel`. For more details on the challenges of creating a compatibility layer to keep both systems in sync, refer to [Issue#76](https://github.com/networkx/nx-parallel/issues/76).
153+
154+
If you have feedback or suggestions, feel free to open an issue or submit a pull request.
155+
156+
Thank you :)

Diff for: README.md

+28-22
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,26 @@ nx-parallel is a NetworkX backend that uses joblib for parallelization. This pro
44

55
## Algorithms in nx-parallel
66

7-
- [number_of_isolates](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L8)
8-
- [square_clustering](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L10)
9-
- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L9)
10-
- [closeness_vitality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L9)
11-
- [is_reachable](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L10)
12-
- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L54)
13-
- [all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L17)
14-
- [approximate_all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L12)
15-
- [betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L19)
16-
- [edge_betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L94)
17-
- [node_redundancy](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/bipartite/redundancy.py#L11)
18-
- [all_pairs_dijkstra](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L28)
19-
- [all_pairs_dijkstra_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L71)
20-
- [all_pairs_dijkstra_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L121)
21-
- [all_pairs_bellman_ford_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L164)
22-
- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L209)
23-
- [johnson](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L252)
24-
- [all_pairs_all_shortest_paths](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/generic.py#L10)
25-
- [all_pairs_shortest_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L18)
26-
- [all_pairs_shortest_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L62)
7+
- [all_pairs_all_shortest_paths](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/generic.py#L11)
8+
- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L212)
9+
- [all_pairs_bellman_ford_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L168)
10+
- [all_pairs_dijkstra](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L29)
11+
- [all_pairs_dijkstra_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L124)
12+
- [all_pairs_dijkstra_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L73)
13+
- [all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L18)
14+
- [all_pairs_shortest_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L63)
15+
- [all_pairs_shortest_path_length](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L19)
16+
- [approximate_all_pairs_node_connectivity](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L13)
17+
- [betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L20)
18+
- [closeness_vitality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L10)
19+
- [edge_betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L96)
20+
- [is_reachable](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L13)
21+
- [johnson](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L256)
22+
- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L10)
23+
- [node_redundancy](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/bipartite/redundancy.py#L12)
24+
- [number_of_isolates](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L9)
25+
- [square_clustering](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L11)
26+
- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L59)
2727

2828
<details>
2929
<summary>Script used to generate the above list</summary>
@@ -107,6 +107,12 @@ Note that for all functions inside `nx_code.py` that do not have an nx-parallel
107107
import networkx as nx
108108
import nx_parallel as nxp
109109

110+
# enabling networkx's config for nx-parallel
111+
nx.config.backends.parallel.active = True
112+
113+
# setting `n_jobs` (by default, `n_jobs=None`)
114+
nx.config.backends.parallel.n_jobs = 4
115+
110116
G = nx.path_graph(4)
111117
H = nxp.ParallelGraph(G)
112118

@@ -121,10 +127,10 @@ nxp.betweenness_centrality(G)
121127

122128
# method 4 : using nx-parallel implementation with ParallelGraph object
123129
nxp.betweenness_centrality(H)
124-
125-
# output : {0: 0.0, 1: 0.6666666666666666, 2: 0.6666666666666666, 3: 0.0}
126130
```
127131

132+
For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)! Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on functionalities provided by networkx for backends and configs like logging, `backend_priority`, etc. Another way to configure nx-parallel is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html).
133+
128134
### Notes
129135

130136
1. Some functions in networkx have the same name but different implementations, so to avoid these name conflicts at the time of dispatching networkx differentiates them by specifying the `name` parameter in the `_dispatchable` decorator of such algorithms. So, `method 3` and `method 4` are not recommended. But, you can use them if you know the correct `name`. For example:

0 commit comments

Comments
 (0)