From 854ac9da8d791d7e5747eaf22ae86a9dff233add Mon Sep 17 00:00:00 2001
From: Tyler Martin <tyler.martin@nist.gov>
Date: Sun, 23 Feb 2025 09:46:30 -0500
Subject: [PATCH] [docs] adds expanded pipelines tutorial

---
 docs/source/explanations/pipelines            |   6 +
 docs/source/how-to/appending_to_xarray.rst    |   6 +
 .../how-to/building_xarray_datasets.rst       |   6 +
 docs/source/how-to/index.rst                  |   9 +-
 docs/source/tutorials/building_piplines.rst   | 159 ++++++++++++++++++
 docs/source/tutorials/getting_started.rst     |  32 +---
 docs/source/tutorials/index.rst               |   5 +-
 7 files changed, 192 insertions(+), 31 deletions(-)
 create mode 100644 docs/source/explanations/pipelines
 create mode 100644 docs/source/how-to/appending_to_xarray.rst
 create mode 100644 docs/source/how-to/building_xarray_datasets.rst
 create mode 100644 docs/source/tutorials/building_piplines.rst

diff --git a/docs/source/explanations/pipelines b/docs/source/explanations/pipelines
new file mode 100644
index 0000000..b7b964c
--- /dev/null
+++ b/docs/source/explanations/pipelines
@@ -0,0 +1,6 @@
+.. warning::
+
+    This section is a work in progress.
+
+Pipelines
+=========
\ No newline at end of file
diff --git a/docs/source/how-to/appending_to_xarray.rst b/docs/source/how-to/appending_to_xarray.rst
new file mode 100644
index 0000000..0e9d78d
--- /dev/null
+++ b/docs/source/how-to/appending_to_xarray.rst
@@ -0,0 +1,6 @@
+Appending Data to xarray.Datasets
+=================================
+
+.. warning::
+
+    This tutorial is a work in progress.
diff --git a/docs/source/how-to/building_xarray_datasets.rst b/docs/source/how-to/building_xarray_datasets.rst
new file mode 100644
index 0000000..eb4bcb3
--- /dev/null
+++ b/docs/source/how-to/building_xarray_datasets.rst
@@ -0,0 +1,6 @@
+Building xarray.Datasets from Scratch
+======================================
+
+.. warning::
+
+    This tutorial is a work in progress.
\ No newline at end of file
diff --git a/docs/source/how-to/index.rst b/docs/source/how-to/index.rst
index 2ab5624..609259c 100644
--- a/docs/source/how-to/index.rst
+++ b/docs/source/how-to/index.rst
@@ -1,13 +1,10 @@
 How-To Guides
 =============
 
-Practical step-by-step guides for working with AFL-agent.
+Guides for specific tasks
 
 .. toctree::
    :maxdepth: 2
-   :caption: Available Guides:
 
-   custom_pipeline
-   data_preprocessing
-   visualization
-   deployment 
\ No newline at end of file
+   building_xarray_datasets
+   appending_to_xarray
diff --git a/docs/source/tutorials/building_piplines.rst b/docs/source/tutorials/building_piplines.rst
new file mode 100644
index 0000000..aa403a9
--- /dev/null
+++ b/docs/source/tutorials/building_piplines.rst
@@ -0,0 +1,159 @@
+Building Pipelines
+===================
+
+.. warning::
+
+    This tutorial is a work in progress.
+
+Here we'll go into more detail on the Quick Start Example from
+:doc:'getting_started'. In this example, we'll build a pipeline that
+uses a Savitzky-Golay filter to compute the first derivative of the
+measurement, then computes the similarity between the derivative and
+itself, then clusters the data using spectral clustering, and finally
+fits a Gaussian Process classifier to the data.
+
+
+Input Data
+----------
+
+First let's define the input data for the pipeline. This codebase uses
+:py:class:`xarray.Dataset` to store the data. This is a powerful and flexible
+data structure for working with multi-dimensional data.
+
+.. code-block:: python
+
+   import numpy as np
+   import xarray as xr
+
+   # !!! these should be specific data so users understand the shape of the data
+   measurements = ... # data from your measurement (e.g. SANS, SAXS, UV-vis, etc.)
+   x = ... #x values of your data, (e.g. q-values, energy, wavenumber, wavelength, etc.)
+   compositions = ... # composition of your samples
+
+   # Create dataset
+   ds = xr.Dataset(
+       data_vars={
+           'measurement': (['sample', 'x'], measurements),
+           'composition': (['sample', 'components'], compositions)
+       },
+       coords={
+           'x': x,
+           'components': ['A', 'B', 'C']
+       }
+   )
+
+.. warning::
+
+   Show a screenshot of the dataset output.
+
+   Plot the dataset?
+
+
+
+Pipeline Step 1: Savitzky-Golay Filter
+--------------------------------------
+
+To begin, we'll instantiate a :py:class:`SavgolFilter` object using the a context
+manager (i.e., the 'with' construct shown below). Using this approach, each
+Pipeline operation that is defined in the context is automatically added to the
+``my_first_pipeline`` variable.
+
+
+.. code-block:: python
+
+   from AFL.double_agent import *
+
+   with Pipeline() as my_first_pipeline:
+
+       SavgolFilter(
+           input_variable='measurement', 
+           output_variable='derivative', 
+           dim='x', 
+           derivative=1
+           )
+
+Going over the keyword arguments one by one:
+
+- The ``input_variable`` keyword argument specifies the name of the variable in the dataset that will be used as
+  the input to the Savitzky-Golay filter.
+- The ``output_variable`` keyword argument specifies the name of the new variable that will be added to the dataset.
+- The ``dim`` keyword argument specifies the dimension along which the filter will be applied.
+- The ``derivative`` keyword argument specifies the order of the derivative to be computed.
+
+We can inspect the pipeline by printing the ``my_first_pipeline`` variable.
+
+.. code-block:: python
+
+   my_first_pipeline.print()
+
+.. warning::
+
+   Add a screenshot of the pipeline printout.
+
+Finally, we can run the pipeline on the dataset and plot the results.
+
+.. code-block:: python
+
+   ds = my_first_pipeline.calculate(ds)
+
+   ds.measurement.isel(sample=0).plot()
+   ds.derivative.isel(sample=0).plot()
+
+
+Pipeline Step 2: Similarity
+---------------------------
+
+
+Pipeline Step 3: Spectral Clustering
+-----------------------------------
+
+
+Pipeline Step 4: Gaussian Process Classifier
+--------------------------------------------
+
+
+Pipeline Step 5: Acquisition Function
+-------------------------------------
+
+
+Full Pipeline
+--------------
+
+Let's loook at the full pipeline defined all at once.
+
+
+.. code-block:: python
+
+   from AFL.double_agent import *
+
+   with Pipeline() as pipeline:
+
+       SavgolFilter(
+           input_variable='measurement', 
+           output_variable='derivative', 
+           dim='x', 
+           derivative=1
+           )
+
+       Similarity(
+           input_variable='derivative', 
+           output_variable='similarity', 
+           params={'metric': 'cosine'}
+           )
+
+       SpectralClustering(
+           input_variable='similarity',
+           output_variable='labels',
+           )
+
+       GaussianProcessClassifier(
+           feature_input_variable='composition',
+           predictor_input_variable='labels',
+           output_prefix='extrap',
+       )
+
+       MaxValueAF(
+           input_variable='extrap_variance',
+           output_variable='next_sample'
+       )
+
diff --git a/docs/source/tutorials/getting_started.rst b/docs/source/tutorials/getting_started.rst
index 3127945..3a1d917 100644
--- a/docs/source/tutorials/getting_started.rst
+++ b/docs/source/tutorials/getting_started.rst
@@ -1,21 +1,8 @@
-Getting Started with AFL-agent
-==============================
-
-This tutorial will help you get started with AFL-agent by walking through a basic example of building and running a pipeline for phase mapping.
-
-Prerequisites
--------------
-
-Before starting, make sure you have:
-
-1. Python 3.11 or later installed
-2. AFL-agent installed (see :ref:`installation`)
-3. Basic understanding of Python and NumPy
-
 Quick Start Example
---------------------
+===================
 
-Here's a complete example that demonstrates how to build a pipeline for choosing a sample composition:
+This short example will help you get started with AFL-agent. See
+:doc:'building_pipelines' for a more detailed tutorial. 
 
 .. code-block:: python
 
@@ -87,13 +74,13 @@ Let's break down what's happening in this example:
 
    - `SavgolFilter`: Calculates derivatives of the measurement data
 
-   - `Similarity`: Computes similarity between samples
+   - `Similarity`: Computes similarity between measurement data
 
-   - `SpectralClustering`: Groups similar samples together
+   - `SpectralClustering`: Groups (i.e., clusters) similar measurement data together
 
-   - `GaussianProcessClassifier`: Predicts phase boundaries
+   - `GaussianProcessClassifier`: Extrapolates the clustering labels for all compositions
 
-   - `MaxValueAF`: Selects the next sample to measure
+   - `MaxValueAF`: Selects the next sample to measure as the composition of highest entropy in phase label
 
 4. We create a synthetic dataset with measurements and compositions
 5. Finally, we run the pipeline on our dataset
@@ -103,6 +90,5 @@ Next Steps
 
 Now that you've seen a basic example, you might want to:
 
-* Learn more about :doc:`building_pipelines`
-* Understand the :doc:`../explanations/architecture`
-* See how to :doc:`../how-to/custom_pipeline` 
\ No newline at end of file
+* Build more complicated pipelines: :doc:`building_pipelines`
+* Understand the pipeline concept: :doc:`../explanations/architecture`
\ No newline at end of file
diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst
index 65c9699..f323840 100644
--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
@@ -1,10 +1,11 @@
 Tutorials
 =========
 
-Learn how to build AFL-agent pipelines by following these step-by-step tutorials.
+Step-by-step guides for beginners to use AFL-agent
 
 .. toctree::
    :maxdepth: 1
 
    installation
-   getting_started
\ No newline at end of file
+   getting_started
+   building_pipelines
\ No newline at end of file