From 854ac9da8d791d7e5747eaf22ae86a9dff233add Mon Sep 17 00:00:00 2001 From: Tyler Martin Date: Sun, 23 Feb 2025 09:46:30 -0500 Subject: [PATCH] [docs] adds expanded pipelines tutorial --- docs/source/explanations/pipelines | 6 + docs/source/how-to/appending_to_xarray.rst | 6 + .../how-to/building_xarray_datasets.rst | 6 + docs/source/how-to/index.rst | 9 +- docs/source/tutorials/building_piplines.rst | 159 ++++++++++++++++++ docs/source/tutorials/getting_started.rst | 32 +--- docs/source/tutorials/index.rst | 5 +- 7 files changed, 192 insertions(+), 31 deletions(-) create mode 100644 docs/source/explanations/pipelines create mode 100644 docs/source/how-to/appending_to_xarray.rst create mode 100644 docs/source/how-to/building_xarray_datasets.rst create mode 100644 docs/source/tutorials/building_piplines.rst diff --git a/docs/source/explanations/pipelines b/docs/source/explanations/pipelines new file mode 100644 index 0000000..b7b964c --- /dev/null +++ b/docs/source/explanations/pipelines @@ -0,0 +1,6 @@ +.. warning:: + + This section is a work in progress. + +Pipelines +========= \ No newline at end of file diff --git a/docs/source/how-to/appending_to_xarray.rst b/docs/source/how-to/appending_to_xarray.rst new file mode 100644 index 0000000..0e9d78d --- /dev/null +++ b/docs/source/how-to/appending_to_xarray.rst @@ -0,0 +1,6 @@ +Appending Data to xarray.Datasets +================================= + +.. warning:: + + This tutorial is a work in progress. diff --git a/docs/source/how-to/building_xarray_datasets.rst b/docs/source/how-to/building_xarray_datasets.rst new file mode 100644 index 0000000..eb4bcb3 --- /dev/null +++ b/docs/source/how-to/building_xarray_datasets.rst @@ -0,0 +1,6 @@ +Building xarray.Datasets from Scratch +====================================== + +.. warning:: + + This tutorial is a work in progress. \ No newline at end of file diff --git a/docs/source/how-to/index.rst b/docs/source/how-to/index.rst index 2ab5624..609259c 100644 --- a/docs/source/how-to/index.rst +++ b/docs/source/how-to/index.rst @@ -1,13 +1,10 @@ How-To Guides ============= -Practical step-by-step guides for working with AFL-agent. +Guides for specific tasks .. toctree:: :maxdepth: 2 - :caption: Available Guides: - custom_pipeline - data_preprocessing - visualization - deployment \ No newline at end of file + building_xarray_datasets + appending_to_xarray diff --git a/docs/source/tutorials/building_piplines.rst b/docs/source/tutorials/building_piplines.rst new file mode 100644 index 0000000..aa403a9 --- /dev/null +++ b/docs/source/tutorials/building_piplines.rst @@ -0,0 +1,159 @@ +Building Pipelines +=================== + +.. warning:: + + This tutorial is a work in progress. + +Here we'll go into more detail on the Quick Start Example from +:doc:'getting_started'. In this example, we'll build a pipeline that +uses a Savitzky-Golay filter to compute the first derivative of the +measurement, then computes the similarity between the derivative and +itself, then clusters the data using spectral clustering, and finally +fits a Gaussian Process classifier to the data. + + +Input Data +---------- + +First let's define the input data for the pipeline. This codebase uses +:py:class:`xarray.Dataset` to store the data. This is a powerful and flexible +data structure for working with multi-dimensional data. + +.. code-block:: python + + import numpy as np + import xarray as xr + + # !!! these should be specific data so users understand the shape of the data + measurements = ... # data from your measurement (e.g. SANS, SAXS, UV-vis, etc.) + x = ... #x values of your data, (e.g. q-values, energy, wavenumber, wavelength, etc.) + compositions = ... # composition of your samples + + # Create dataset + ds = xr.Dataset( + data_vars={ + 'measurement': (['sample', 'x'], measurements), + 'composition': (['sample', 'components'], compositions) + }, + coords={ + 'x': x, + 'components': ['A', 'B', 'C'] + } + ) + +.. warning:: + + Show a screenshot of the dataset output. + + Plot the dataset? + + + +Pipeline Step 1: Savitzky-Golay Filter +-------------------------------------- + +To begin, we'll instantiate a :py:class:`SavgolFilter` object using the a context +manager (i.e., the 'with' construct shown below). Using this approach, each +Pipeline operation that is defined in the context is automatically added to the +``my_first_pipeline`` variable. + + +.. code-block:: python + + from AFL.double_agent import * + + with Pipeline() as my_first_pipeline: + + SavgolFilter( + input_variable='measurement', + output_variable='derivative', + dim='x', + derivative=1 + ) + +Going over the keyword arguments one by one: + +- The ``input_variable`` keyword argument specifies the name of the variable in the dataset that will be used as + the input to the Savitzky-Golay filter. +- The ``output_variable`` keyword argument specifies the name of the new variable that will be added to the dataset. +- The ``dim`` keyword argument specifies the dimension along which the filter will be applied. +- The ``derivative`` keyword argument specifies the order of the derivative to be computed. + +We can inspect the pipeline by printing the ``my_first_pipeline`` variable. + +.. code-block:: python + + my_first_pipeline.print() + +.. warning:: + + Add a screenshot of the pipeline printout. + +Finally, we can run the pipeline on the dataset and plot the results. + +.. code-block:: python + + ds = my_first_pipeline.calculate(ds) + + ds.measurement.isel(sample=0).plot() + ds.derivative.isel(sample=0).plot() + + +Pipeline Step 2: Similarity +--------------------------- + + +Pipeline Step 3: Spectral Clustering +----------------------------------- + + +Pipeline Step 4: Gaussian Process Classifier +-------------------------------------------- + + +Pipeline Step 5: Acquisition Function +------------------------------------- + + +Full Pipeline +-------------- + +Let's loook at the full pipeline defined all at once. + + +.. code-block:: python + + from AFL.double_agent import * + + with Pipeline() as pipeline: + + SavgolFilter( + input_variable='measurement', + output_variable='derivative', + dim='x', + derivative=1 + ) + + Similarity( + input_variable='derivative', + output_variable='similarity', + params={'metric': 'cosine'} + ) + + SpectralClustering( + input_variable='similarity', + output_variable='labels', + ) + + GaussianProcessClassifier( + feature_input_variable='composition', + predictor_input_variable='labels', + output_prefix='extrap', + ) + + MaxValueAF( + input_variable='extrap_variance', + output_variable='next_sample' + ) + diff --git a/docs/source/tutorials/getting_started.rst b/docs/source/tutorials/getting_started.rst index 3127945..3a1d917 100644 --- a/docs/source/tutorials/getting_started.rst +++ b/docs/source/tutorials/getting_started.rst @@ -1,21 +1,8 @@ -Getting Started with AFL-agent -============================== - -This tutorial will help you get started with AFL-agent by walking through a basic example of building and running a pipeline for phase mapping. - -Prerequisites -------------- - -Before starting, make sure you have: - -1. Python 3.11 or later installed -2. AFL-agent installed (see :ref:`installation`) -3. Basic understanding of Python and NumPy - Quick Start Example --------------------- +=================== -Here's a complete example that demonstrates how to build a pipeline for choosing a sample composition: +This short example will help you get started with AFL-agent. See +:doc:'building_pipelines' for a more detailed tutorial. .. code-block:: python @@ -87,13 +74,13 @@ Let's break down what's happening in this example: - `SavgolFilter`: Calculates derivatives of the measurement data - - `Similarity`: Computes similarity between samples + - `Similarity`: Computes similarity between measurement data - - `SpectralClustering`: Groups similar samples together + - `SpectralClustering`: Groups (i.e., clusters) similar measurement data together - - `GaussianProcessClassifier`: Predicts phase boundaries + - `GaussianProcessClassifier`: Extrapolates the clustering labels for all compositions - - `MaxValueAF`: Selects the next sample to measure + - `MaxValueAF`: Selects the next sample to measure as the composition of highest entropy in phase label 4. We create a synthetic dataset with measurements and compositions 5. Finally, we run the pipeline on our dataset @@ -103,6 +90,5 @@ Next Steps Now that you've seen a basic example, you might want to: -* Learn more about :doc:`building_pipelines` -* Understand the :doc:`../explanations/architecture` -* See how to :doc:`../how-to/custom_pipeline` \ No newline at end of file +* Build more complicated pipelines: :doc:`building_pipelines` +* Understand the pipeline concept: :doc:`../explanations/architecture` \ No newline at end of file diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst index 65c9699..f323840 100644 --- a/docs/source/tutorials/index.rst +++ b/docs/source/tutorials/index.rst @@ -1,10 +1,11 @@ Tutorials ========= -Learn how to build AFL-agent pipelines by following these step-by-step tutorials. +Step-by-step guides for beginners to use AFL-agent .. toctree:: :maxdepth: 1 installation - getting_started \ No newline at end of file + getting_started + building_pipelines \ No newline at end of file