Fast particle deposition at post-processing time
This small Python library implements fundamental grid deposition algorithms to analyse (rectilinear) grid + particle datasets, with an emphasize on performance. Core algorithms are implemented as Cython extensions.
GPGI stands for Generic Particle + Grid data Interface
python -m pip install --upgrade pip
python -m pip install gpgi
A rectilinear grid is defined as 1D arrays representing cell left edges in each directions. Note that the last point of such an array is interpreted as the right edge of the rightmost cell, so for instance, a 1D grid containing 100 cells is defined by 101 edges.
Particles are defined as points that live within the grid's bounds.
Deposition is the action of going from particle description to a grid description of a field. It is useful to analyze, compare and combine simulation data that exists in a combination of the two formalisms. This process is not reversible as it degrades information.
For instance, here's a simple overlay of a particle set (red dots) against a background that represents the deposited particle count.
This example illustrates the simplest possible deposition method, "Nearest Grid Point" (NGP), in which each particle contributes only to the cell that contains it.
More refined methods are also available.
method name | abbreviated name | order |
---|---|---|
Nearest Grid Point | NGP | 0 |
Cloud in Cell | CIC | 1 |
Triangular Shaped Cloud | TSC | 2 |
new in gpgi 0.12.0
User-defined alternative methods may be provided to Dataset.deposit
as method=my_func
.
Their signature need to be compatible with gpgi.typing.DepositionMethodT
or gpgi.typing.DepositionMethodWithMetadataT
.
geometry name | axes order |
---|---|
cartesian | x, y, z |
polar | radius, z, azimuth |
cylindrical | radius, azimuth, z |
spherical | radius, colatitude, azimuth |
equatorial | radius, azimuth, latitude |
An important step in performing deposition is to associate particle indices to cell indices. This step is called "particle indexing". In directions where the grid is uniformly stepped (if any), indexing a particle is an O(1) operation. In the more general case, indexing is performed by bisection, which is a O(log(nx)) operation (where nx represents the number of cells in the direction of interest).
The API consists in a load
function, which returns a Dataset
object.
Load data
import numpy as np
import gpgi
nx = ny = 64
nparticles = 600_000
prng = np.random.RandomState(0)
ds = gpgi.load(
geometry="cartesian",
grid={
"cell_edges": {
"x": np.linspace(-1, 1, nx),
"y": np.linspace(-1, 1, ny),
},
},
particles={
"coordinates": {
"x": 2 * (prng.normal(0.5, 0.25, nparticles) % 1 - 0.5),
"y": 2 * (prng.normal(0.5, 0.25, nparticles) % 1 - 0.5),
},
"fields": {
"mass": np.ones(nparticles),
},
},
)
The Dataset
object holds a grid
and a particles
attribute,
which both hold a fields
attribute for accessing their data.
But more importantly, the Dataset
has a deposit
method to
translate particle fields to the grid formalism.
Deposit Particle fields on the grid
particle_mass = ds.deposit("mass", method="nearest_grid_point") # or "ngp" for shorts
Visualize
In this example we'll use matplotlib
for rendering, but note that matplotlib
is not a dependency to gpgi
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set(aspect=1, xlabel="x", ylabel="y")
im = ax.pcolormesh(
"x",
"y",
particle_mass.T,
data=ds.grid.cell_edges,
cmap="viridis",
)
fig.colorbar(im, ax=ax)
The example script given here takes about a second (top to bottom).
new in gpgi 0.4.0
Dataset
objects have a special attribute metadata
which is a dictionary with string keys.
This attribute is meant to hold any special metadata that may be relevant for labelling or processing (e.g. simulation time, author, ...).
Metadata can be supplied at load time as
ds = gpgi.load(
geometry="cartesian",
grid=...,
particles=...,
metadata={"simulation_time": 12.5, "author": "Clément Robert"}
)
new in gpgi 0.5.0
With CIC and TSC deposition, particles contribute to cells neighbouring the one
that contains them. For particles that live in the outermost layer of the
domain, this means some of their contribution is lost. This behavior
corresponds to the default 'open'
boundary condition, but gpgi
has builtin
support for more conservative boundary conditions.
Boundary conditions can selected per field, per axis and per side. Builtin recipes all perform linear combinations of ghost layers (same-side and opposite side) and active domain layers (same-side and opposite side), and replace the same-side active layer with the result.
User-selected boundary conditions take the form of an optional argument to
Dataset.deposit
, as dictionary with keys being axes names, and values being
2-tuples of boundary conditions names (for left and right side respectively).
For instance, here's how one would require periodic boundary conditions on all axes:
ds.deposit(
"mass",
method="cic",
boundaries={
"x": ("periodic", "periodic"),
"y": ("periodic", "periodic"),
}
)
Unspecified axes will use the default 'open'
boundary.
boundary conditions | description | conservative ? |
---|---|---|
open (default) | no special treatment | no |
periodic | add opposite ghost layer to the active domain | yes |
wall | add same-side ghost layer to the active domain | yes |
antisymmetric | subtract same-side ghost layer from the active domain | no |
gpgi
's boundary recipes can be customized. Let's illustrate this feature with a simple example.
Say we want to fix the value of the deposited field in some outer layer.
This is done by defining a new function on the user side:
def ones(
same_side_active_layer,
same_side_ghost_layer,
opposite_side_active_layer,
opposite_side_ghost_layer,
weight_same_side_active_layer,
weight_same_side_ghost_layer,
weight_opposite_side_active_layer,
weight_opposite_side_ghost_layer,
side,
metadata,
):
return 1.0
where all first eight arguments are numpy.ndarray
objects with the same shape (which includes ghost padding !),
to which the return value must be broadcastable, side
can only be either
"left"
or "right"
, and metadata
is the special Dataset.metadata
attribute. Not all arguments need be used in the body of the function, but this
signature is required.
The method must then be registered as a boundary condition recipe as
ds.boundary_recipes.register("ones", ones)
where the associated key (here "ones"
) is arbitrary. The recipe can now be
used exactly as builtin ones, and all of them can be mixed arbitrarily.
ds.deposit(
"mass",
method="cic",
boundaries={
"x": ("ones", "wall"),
"y": ("periodic", "periodic"),
}
)
Note that all first eight arguments in a boundary recipe function should
represent an extensive physical quantity (as opposed to intensive). When
depositing an intensive quantity u
, a weight field w
should be supplied
(see next section), in which case, the first four arguments represent u*w
and
the following four represent w, so that u can still be obtained within the
function as a ratio if needed.
new in gpgi 0.7.0
Fundamentally, deposition algorithms construct on-grid fields by performing summations. An implication is that the physical quantities being deposited are required to be extensive (like mass or momentum). Intensive quantities (like velocity or temperature) require additional operations, and necessitate the use of an additional weight field.
This section provides showcases their usage. For a detailed explanation of the deposition algorithm for intensive quantities, see Deposition algorithm.
In order to deposit an intensive field (e.g., vx
), an additional weight_field
argument must be provided as
ds.deposit(
"vx",
method="cic",
boundaries={
"y": ("periodic", "periodic"),
"x": ("antisymmetric", "antisymmetric"),
},
weight_field="mass",
weight_field_boundaries={
"y": ("periodic", "periodic"),
"x": ("open", "open"),
},
)
Boundary recipes may be also associated to the weight field with the
weight_field_boundaries
argument. This arguments becomes required if
boundaries
and weight_field
are both provided.
Call help(ds.deposit)
for more detail.
new in gpgi 0.14.0
gpgi can load arbitrarily ordered particle sets, though deposition algorithms perform better when the in-memory position of particles correlates with their physical positions relative to the grid, since such a state minimizes the number of cache misses.
Particles may be sorted by a counting sort algorithm, as
ds = ds.sorted()
Note that this method returns a copy of the dataset, so it will best perform for datasets that, at most, fit in half your RAM.
This operation is costly in itself, so there may be a trade-off depending on how many depositions one needs to perform on a given dataset before tossing it out.
By default, axes are weighted in the order that's optimal for gpgi's deposition routines, but arbitrary priority order may be specified as, for instance
ds = ds.sorted(axes=(1, 0))
Use the Dataset.is_sorted
method to check whether particles are already sorted without
performing the sort. Dataset.is_sorted
accepts an axes
argument just like Dataset.sorted
. This is useful for testing and comparative purposes.
This section provides details on the general deposition algorithm, as
implemented in gpgi
.
Without loss of generality, we will illustrate how an intensive field (v
)
is deposited, since this case requires the most computational steps. As it
happens, depositing an extensive field (w
) separately is actually part of
the algorithm.
Definitions
v
is an intensive field that we want to deposit on the gridw
is an extensive field that will be used as weightsu = v * w
is an extensive equivalent tov
(conceptually, ifv
is a velocity andw
is a mass,u
corresponds to a momentum)
u(i)
, v(i)
and w(i)
are defined for each particle i
.
We note U(x)
, V(x)
and W(x)
the corresponding on-grid fields, where V(x)
is the final output of the algorithm. These are defined at grid cell centers
x
, within the active domain.
Last, we note U'(x)
, V'(x)
and W'(x)
the raw deposited fields, meaning
no special treatment is applied to the outermost layers (boundary conditions).
These are defined at grid cell centers, including one ghost layer that will be
used to apply boundary conditions.
Algorithm
W'
andU'
are computed as
W'(x) = Σ c(i,x) w(i)
U'(x) = Σ c(i,x) w(i) v(i)
where c(i,x)
are geometric coefficients associated with the deposition method. Taking the nearest grid point (NGP) method for illustration, c(i,x) = 1
if particle i
is contained in the cell whose center is x
, and c(i,x) = 0
elsewhere.
- boundary conditions are applied
W(x) = W_BCO(W', 1, metadata)
U(x) = U_BCO(U', W', metadata)
where W_BCO
and U_BCO
denote arbitrary boundary condition operators
associated with W
and U
respectively, and which take 3 arguments,
representing the field to be transformed, its associated weight field and a
wildcard metadata
argument which may contain any additional data relevant to
the operator.
Note 1
is used a placeholder "weight" for W
, for symmetry reasons: all boundary condition operators must expose a similar interface, as explained in Define custom recipes.
- Finally,
V(x)
is obtained as
V(x) = (U/W)(x)
Starting in gpgi 2.0.0, thread safety is guaranteed in Dataset.host_cell_index
computation and Dataset.deposit
, and both operations release the
GIL (Global Interpreter Lock) around their respective hotloops. Thread safety is
also tested against the experimental free-threaded build of Python 3.13.
Note that, by default, Dataset.deposit
still uses a lock per Dataset
instance, which in the most general case is preferable since concurrently
depositing many fields can cause catastrophic degradations of performances as
it encourages cache misses. Optimal performance is however application-specific,
so this strategy can be overridden using the lock
parameter:
- using
lock=None
will not use any lock, which in restricted conditions leads to better walltime performances - alternatively, an externally managed
threading.Lock
instance may be supplied
Dataset.boundary_recipes.register
is also thread-safe: registering a shared
function multiple times is supported, but an error is raised in case one
attempts registering a different function under an existing key.