[feature] Setup API auto doc (#429)

*Issue #, if available:* *Description of changes:* This PR sets up the new documentation generation mechanism, create new API doc rst files, and modify existing Python code for doc files. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Theodore Vasiloudis <[email protected]>
awslabs · Sep 12, 2023 · 1c06aac · 1c06aac
1 parent c63fb1f
commit 1c06aac
Show file tree

Hide file tree

Showing 19 changed files with 342 additions and 24 deletions.
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,30 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.9"
+
+# Build documentation in the "docs/" directory with Sphinx
+sphinx:
+   configuration: docs/source/conf.py
+
+# Optionally build your docs in additional formats such as PDF and ePub
+# formats:
+#    - pdf
+#    - epub
+
+# Optional but recommended, declare the Python requirements required
+# to build your documentation
+# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+   install:
+   - method: pip
+     path: .
+   - requirements: docs/requirements.txt
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,6 @@
+sphinx==7.1.2
+sphinx-rtd-theme==1.3.0
+--extra-index-url https://download.pytorch.org/whl/cpu
+torch==1.13.1+cpu
+-f https://data.dgl.ai/wheels-internal/repo.html
+dgl==1.0.4
diff --git a/docs/source/_templates/classtemplate.rst b/docs/source/_templates/classtemplate.rst
@@ -0,0 +1,13 @@
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :show-inheritance:
+    :members: prepare_data, get_node_feats, get_edge_feats, get_labels, forward, get_sparse_params, 
+              get_general_dense_parameters, get_lm_dense_parameters, save_model, remove_saved_model,
+              save_topk_models, get_best_model_path, restore_model, fit, eval, infer, evaluate,
+              do_eval, compute_score, predict
diff --git a/docs/source/api/graphstorm.customized.rst b/docs/source/api/graphstorm.customized.rst
@@ -0,0 +1,62 @@
+.. _apicustomized:
+
+customized model APIs
+==========================
+
+    GraphStorm provides a set of APIs for users to integrate their own customized models with
+    the framework of GraphStorm, so that users' own models can leverage GraphStorm's easy-to-use
+    and distributed capabilities.
+
+    For how to modify users' own models, please refer to this :ref:`Use Your Own Model Tutorial
+    <use-own-models>`.
+
+    In general, there are three sets of APIs involved in programming customized models.
+
+    * Dataloaders: users need to extend GraphStorm's abstract node or edge dataloader to implement
+      their own graph samplers or mini_batch generators.
+    * Models: depending on specific GML tasks, users need to extend the corresponding ModelBase and
+      ModelInterface, and then implement the required abstract functions.
+    * Evaluators: if necessary, users can also extend the two evaluator templates to implement their
+      own performance evaluation method.
+
+.. currentmodule:: graphstorm
+
+Dataloaders
+------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    .. dataloading.AbsNodeDataLoader
+    .. dataloading.AbsEdgeDataLoader
+
+Models
+------------
+
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    model.GSgnnModelBase
+    model.GSgnnNodeModelBase
+    model.GSgnnEdgeModelBase
+    model.GSgnnLinkPredictionModelBase
+    model.GSgnnNodeModelInterface
+    model.GSgnnEdgeModelInterface
+    model.GSgnnLinkPredictionModelInterface
+
+Evaluators
+------------
+
+    If users want to implement customized evaluators or evaluation methods, a best practice is to
+    extend the ``eval.GSgnnInstanceEvaluator`` class, and implement the abstract methods.
+
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    eval.GSgnnInstanceEvaluator
+    eval.GSgnnLPEvaluator
diff --git a/docs/source/api/graphstorm.dataloading.rst b/docs/source/api/graphstorm.dataloading.rst
@@ -0,0 +1,32 @@
+.. _apidataloading:
+
+graphstorm.dataloading
+==========================
+
+    GraphStorm dataloading module includes a set of graph datasets and dataloaders for different
+    graph machine learning tasks.
+
+.. currentmodule:: graphstorm.dataloading
+
+DataSets
+------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnNodeTrainData
+    GSgnnNodeInferData
+    GSgnnEdgeTrainData
+    GSgnnEdgeInferData
+
+Dataloaders
+------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnNodeDataLoader
+    GSgnnEdgeDataLoader
+    GSgnnLinkPredictionDataLoader
diff --git a/docs/source/api/graphstorm.evaluator.rst b/docs/source/api/graphstorm.evaluator.rst
@@ -0,0 +1,20 @@
+.. _apievaluator:
+
+graphstorm.evaluator
+=======================
+
+    GraphStorm evaluators provides built-in evaluation methods for different Graph Machine
+    Learning (GML).
+
+.. currentmodule:: graphstorm.eval
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnLPEvaluator
+    GSgnnMrrLPEvaluator
+    GSgnnPerEtypeMrrLPEvaluator
+    GSgnnAccEvaluator
+    GSgnnRegressionEvaluator
+
diff --git a/docs/source/api/graphstorm.inferer.rst b/docs/source/api/graphstorm.inferer.rst
@@ -0,0 +1,20 @@
+.. _apiinferer:
+
+graphstorm.inferer
+====================
+
+    GraphStorm inferers assemble the distributed inference pipeline for different tasks.
+
+    If possible, users should always use these inferers to avoid handling the distributed
+    processing and tasks.
+
+.. currentmodule:: graphstorm.inference
+
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnLinkPredictionInfer
+    GSgnnNodePredictionInfer
+    GSgnnEdgePredictionInfer
diff --git a/docs/source/api/graphstorm.model.rst b/docs/source/api/graphstorm.model.rst
@@ -0,0 +1,41 @@
+.. _apimodel:
+
+graphstorm.model
+=================
+
+    A GraphStorm model normally contains three components:
+
+    * Input layer: a set of modules to convert input data for different use cases,
+      e.g., embedding texture features.
+    * Encoder: a set of Graph Neural Network modules 
+    * Decoder: a set of modules to convert results from encoders for different tasks,
+      e.g., classification, regression, or link prediction.
+
+.. currentmodule:: graphstorm.model
+
+Model input layers
+-------------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSNodeEncoderInputLayer
+    GSLMNodeEncoderInputLayer
+    GSPureLMNodeInputLayer
+
+Model encoders and layers
+--------------------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    RelationalGCNEncoder
+    RelGraphConvLayer
+    RelationalGATEncoder
+    RelationalAttLayer
+    SAGEEncoder
+    SAGEConv
+    HGTEncoder
+    HGTLayer
diff --git a/docs/source/api/graphstorm.rst b/docs/source/api/graphstorm.rst
@@ -0,0 +1,21 @@
+.. _apigraphstorm:
+
+.. currentmodule:: graphstorm
+
+graphstorm
+============
+
+    The ``graphstorm`` package contains a set of functions for environment setup.
+    Users can directly use the following code to use these functions.
+
+    >>> import graphstorm as gs
+    >>> gs.initialize()
+    >>> gs.get_rank()
+
+.. autosummary::
+    :toctree: ../generated/
+
+    gsf.initialize
+    gsf.get_feat_size
+    utils.get_rank
+    utils.get_world_size
diff --git a/docs/source/api/graphstorm.trainer.rst b/docs/source/api/graphstorm.trainer.rst
@@ -0,0 +1,42 @@
+.. _apitrainer:
+
+graphstorm.trainer
+=====================
+
+    GraphStorm trainers assemble the distributed training pipeline for different tasks or
+    different training methods.
+
+    If possible, users should always use these trainers to avoid handling the distributed
+    processing and tasks.
+
+.. currentmodule:: graphstorm.trainer
+
+
+Base class
+--------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnTrainer
+
+Task classes
+-----------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GSgnnLinkPredictionTrainer
+    GSgnnNodePredictionTrainer
+    GSgnnEdgePredictionTrainer
+
+Method classes
+-----------------
+.. autosummary::
+    :toctree: ../generated/
+    :nosignatures:
+    :template: classtemplate.rst
+
+    GLEMNodePredictionTrainer
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -3,26 +3,45 @@
 # For the full list of built-in configuration values, see the documentation:
 # https://www.sphinx-doc.org/en/master/usage/configuration.html
 
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+sys.path.insert(0, os.path.abspath("../../python"))
+
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
+import graphstorm
+
 project = 'GraphStorm'
 copyright = '2023, AGML team'
 author = 'AGML team'
-release = '0.1.2'
+version = graphstorm.__version__
+release = graphstorm.__version__
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 
-extensions = []
-
+extensions = [
+    "sphinx.ext.duration",
+    "sphinx.ext.doctest",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.coverage",
+    "sphinx.ext.mathjax",
+]
 templates_path = ['_templates']
 exclude_patterns = []
 
 
-
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
 
-html_theme = 'furo'
+html_theme = 'sphinx_rtd_theme'
 html_static_path = ['_static']
diff --git a/docs/source/configuration/configuration-run.rst b/docs/source/configuration/configuration-run.rst
@@ -118,8 +118,7 @@ GraphStorm provides a set of parameters to control how and where to save and res
     - Yaml: ``restore_model_path: /model/checkpoint/``
     - Argument: ``--restore-model-path /model/checkpoint/``
     - Default value: This parameter must be provided if users want to restore a saved model.
-- **restore_model_layers**: Specify which GraphStorm neural network layers to load. This argument is useful when a user wants to pre-train a GraphStorm model using link prediction and fine-tune the same model on a node or edge classification/regression task.
-Currently, three neural network layers are supported, i.e., ``embed`` (input layer), ``gnn`` and ``decoder``. A user can select one or more layers to load.
+- **restore_model_layers**: Specify which GraphStorm neural network layers to load. This argument is useful when a user wants to pre-train a GraphStorm model using link prediction and fine-tune the same model on a node or edge classification/regression task. Currently, three neural network layers are supported, i.e., ``embed`` (input layer), ``gnn`` and ``decoder``. A user can select one or more layers to load.
     - Yaml: ``restore_model_path: embed``
     - Argument: ``--restore-model-layers embed,gnn``
     - Default value: Load all neural network layers

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -34,10 +34,18 @@ Welcome to the GraphStorm Documentation and Tutorials
 
 .. toctree::
    :maxdepth: 2
-   :caption: API Reference:
+   :caption: API Reference
    :hidden:
    :glob:
 
+   api/graphstorm
+   api/graphstorm.dataloading
+   api/graphstorm.model
+   api/graphstorm.trainer
+   api/graphstorm.inferer
+   api/graphstorm.evaluator
+   api/graphstorm.customized
+
 GraphStorm is a graph machine learning (GML) framework designed for enterprise use cases. It simplifies the development, training and deployment of GML models on industry-scale graphs (measured in billons of nodes and edges) by providing scalable training and inference pipelines of GML models. GraphStorm comes with a collection of built-in GML models, allowing users to train a GML model with a single command, eliminating the need to write any code. Moreover, GraphStorm provides a wide range of configurations to customiz model implementations and training pipelines, enhancing model performance. In addition, GraphStorm offers a programming interface that enables users to train custom GML models in a distributed manner. Users can bring their own model implementations and leverage the GraphStorm training pipeline for scalability.
 
 Getting Started

diff --git a/python/graphstorm/__init__.py b/python/graphstorm/__init__.py
@@ -17,6 +17,8 @@
 """
 __version__ = "0.2"
 
+from . import gsf
+from . import utils
 from .utils import get_rank, get_world_size
 from .gsf import initialize, get_feat_size
 from .gsf import create_builtin_node_gnn_model