Skip to content

Commit

Permalink
[docs] adds expanded pipelines tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
martintb committed Feb 23, 2025
1 parent 6c7d219 commit 854ac9d
Show file tree
Hide file tree
Showing 7 changed files with 192 additions and 31 deletions.
6 changes: 6 additions & 0 deletions docs/source/explanations/pipelines
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. warning::

This section is a work in progress.

Pipelines
=========
6 changes: 6 additions & 0 deletions docs/source/how-to/appending_to_xarray.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Appending Data to xarray.Datasets
=================================

.. warning::

This tutorial is a work in progress.
6 changes: 6 additions & 0 deletions docs/source/how-to/building_xarray_datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Building xarray.Datasets from Scratch
======================================

.. warning::

This tutorial is a work in progress.
9 changes: 3 additions & 6 deletions docs/source/how-to/index.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
How-To Guides
=============

Practical step-by-step guides for working with AFL-agent.
Guides for specific tasks

.. toctree::
:maxdepth: 2
:caption: Available Guides:

custom_pipeline
data_preprocessing
visualization
deployment
building_xarray_datasets
appending_to_xarray
159 changes: 159 additions & 0 deletions docs/source/tutorials/building_piplines.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
Building Pipelines

Check warning on line 1 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

document isn't included in any toctree

Check warning on line 1 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

document isn't included in any toctree
===================

.. warning::

This tutorial is a work in progress.

Here we'll go into more detail on the Quick Start Example from
:doc:'getting_started'. In this example, we'll build a pipeline that
uses a Savitzky-Golay filter to compute the first derivative of the
measurement, then computes the similarity between the derivative and
itself, then clusters the data using spectral clustering, and finally
fits a Gaussian Process classifier to the data.


Input Data
----------

First let's define the input data for the pipeline. This codebase uses
:py:class:`xarray.Dataset` to store the data. This is a powerful and flexible
data structure for working with multi-dimensional data.

.. code-block:: python
import numpy as np
import xarray as xr
# !!! these should be specific data so users understand the shape of the data
measurements = ... # data from your measurement (e.g. SANS, SAXS, UV-vis, etc.)
x = ... #x values of your data, (e.g. q-values, energy, wavenumber, wavelength, etc.)
compositions = ... # composition of your samples
# Create dataset
ds = xr.Dataset(
data_vars={
'measurement': (['sample', 'x'], measurements),
'composition': (['sample', 'components'], compositions)
},
coords={
'x': x,
'components': ['A', 'B', 'C']
}
)
.. warning::

Show a screenshot of the dataset output.

Plot the dataset?



Pipeline Step 1: Savitzky-Golay Filter
--------------------------------------

To begin, we'll instantiate a :py:class:`SavgolFilter` object using the a context
manager (i.e., the 'with' construct shown below). Using this approach, each
Pipeline operation that is defined in the context is automatically added to the
``my_first_pipeline`` variable.


.. code-block:: python
from AFL.double_agent import *
with Pipeline() as my_first_pipeline:
SavgolFilter(
input_variable='measurement',
output_variable='derivative',
dim='x',
derivative=1
)
Going over the keyword arguments one by one:

- The ``input_variable`` keyword argument specifies the name of the variable in the dataset that will be used as
the input to the Savitzky-Golay filter.
- The ``output_variable`` keyword argument specifies the name of the new variable that will be added to the dataset.
- The ``dim`` keyword argument specifies the dimension along which the filter will be applied.
- The ``derivative`` keyword argument specifies the order of the derivative to be computed.

We can inspect the pipeline by printing the ``my_first_pipeline`` variable.

.. code-block:: python
my_first_pipeline.print()
.. warning::

Add a screenshot of the pipeline printout.

Finally, we can run the pipeline on the dataset and plot the results.

.. code-block:: python
ds = my_first_pipeline.calculate(ds)
ds.measurement.isel(sample=0).plot()
ds.derivative.isel(sample=0).plot()
Pipeline Step 2: Similarity
---------------------------


Pipeline Step 3: Spectral Clustering
-----------------------------------

Check warning on line 108 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.

Check warning on line 108 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.

Check warning on line 108 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.

Check warning on line 108 in docs/source/tutorials/building_piplines.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.


Pipeline Step 4: Gaussian Process Classifier
--------------------------------------------


Pipeline Step 5: Acquisition Function
-------------------------------------


Full Pipeline
--------------

Let's loook at the full pipeline defined all at once.


.. code-block:: python
from AFL.double_agent import *
with Pipeline() as pipeline:
SavgolFilter(
input_variable='measurement',
output_variable='derivative',
dim='x',
derivative=1
)
Similarity(
input_variable='derivative',
output_variable='similarity',
params={'metric': 'cosine'}
)
SpectralClustering(
input_variable='similarity',
output_variable='labels',
)
GaussianProcessClassifier(
feature_input_variable='composition',
predictor_input_variable='labels',
output_prefix='extrap',
)
MaxValueAF(
input_variable='extrap_variance',
output_variable='next_sample'
)
32 changes: 9 additions & 23 deletions docs/source/tutorials/getting_started.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,8 @@
Getting Started with AFL-agent
==============================

This tutorial will help you get started with AFL-agent by walking through a basic example of building and running a pipeline for phase mapping.

Prerequisites
-------------

Before starting, make sure you have:

1. Python 3.11 or later installed
2. AFL-agent installed (see :ref:`installation`)
3. Basic understanding of Python and NumPy

Quick Start Example
--------------------
===================

Here's a complete example that demonstrates how to build a pipeline for choosing a sample composition:
This short example will help you get started with AFL-agent. See
:doc:'building_pipelines' for a more detailed tutorial.

.. code-block:: python
Expand Down Expand Up @@ -87,13 +74,13 @@ Let's break down what's happening in this example:

- `SavgolFilter`: Calculates derivatives of the measurement data

- `Similarity`: Computes similarity between samples
- `Similarity`: Computes similarity between measurement data

- `SpectralClustering`: Groups similar samples together
- `SpectralClustering`: Groups (i.e., clusters) similar measurement data together

- `GaussianProcessClassifier`: Predicts phase boundaries
- `GaussianProcessClassifier`: Extrapolates the clustering labels for all compositions

- `MaxValueAF`: Selects the next sample to measure
- `MaxValueAF`: Selects the next sample to measure as the composition of highest entropy in phase label

4. We create a synthetic dataset with measurements and compositions
5. Finally, we run the pipeline on our dataset
Expand All @@ -103,6 +90,5 @@ Next Steps

Now that you've seen a basic example, you might want to:

* Learn more about :doc:`building_pipelines`
* Understand the :doc:`../explanations/architecture`
* See how to :doc:`../how-to/custom_pipeline`
* Build more complicated pipelines: :doc:`building_pipelines`

Check warning on line 93 in docs/source/tutorials/getting_started.rst

View workflow job for this annotation

GitHub Actions / docs

unknown document: 'building_pipelines'

Check warning on line 93 in docs/source/tutorials/getting_started.rst

View workflow job for this annotation

GitHub Actions / docs

unknown document: 'building_pipelines'
* Understand the pipeline concept: :doc:`../explanations/architecture`
5 changes: 3 additions & 2 deletions docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
Tutorials
=========

Learn how to build AFL-agent pipelines by following these step-by-step tutorials.
Step-by-step guides for beginners to use AFL-agent

.. toctree::

Check warning on line 6 in docs/source/tutorials/index.rst

View workflow job for this annotation

GitHub Actions / docs

toctree contains reference to nonexisting document 'tutorials/building_pipelines'

Check warning on line 6 in docs/source/tutorials/index.rst

View workflow job for this annotation

GitHub Actions / docs

toctree contains reference to nonexisting document 'tutorials/building_pipelines'
:maxdepth: 1

installation
getting_started
getting_started
building_pipelines

0 comments on commit 854ac9d

Please sign in to comment.