NVIDIA
diff --git a/‎.github/eks-workflow-files/axlearn/axlearn-fuji-model.yml
-4 b/‎.github/eks-workflow-files/axlearn/axlearn-fuji-model.yml
-4
diff --git a/‎.github/eks-workflow-files/axlearn/axlearn-job.yml
+1-1 b/‎.github/eks-workflow-files/axlearn/axlearn-job.yml
+1-1
diff --git a/‎docs/frameworks/axlearn/README.md
+40 b/‎docs/frameworks/axlearn/README.md
+40
diff --git a/‎rosetta/rosetta/projects/axlearn/README.md
-59 b/‎rosetta/rosetta/projects/axlearn/README.md
-59
diff --git a/‎rosetta/rosetta/projects/axlearn/scripts/eks-fuji.yaml
-66 b/‎rosetta/rosetta/projects/axlearn/scripts/eks-fuji.yaml
-66
diff --git a/‎rosetta/rosetta/projects/axlearn/scripts/multinode.py
-71 b/‎rosetta/rosetta/projects/axlearn/scripts/multinode.py
-71
@@ -30,17 +30,13 @@ spec:
                       AG_THRESHOLD=8589934592
                       RS_THRESHOLD=8589934592
                       BASE_XLA_FLAGS=${BASE_XLA_FLAGS:---xla_gpu_enable_latency_hiding_scheduler=true
-                          --xla_gpu_enable_highest_priority_async_stream=true
                           --xla_gpu_all_reduce_combine_threshold_bytes=1073741824
                           --xla_gpu_all_gather_combine_threshold_bytes=1073741824
                           --xla_gpu_reduce_scatter_combine_threshold_bytes=1073741824
                           --xla_gpu_enable_pipelined_all_gather=true
                           --xla_gpu_enable_pipelined_reduce_scatter=true
                           --xla_gpu_enable_pipelined_all_reduce=true
                           --xla_gpu_enable_while_loop_double_buffering=true
-                          --xla_gpu_enable_triton_gemm=false
-                          --xla_gpu_enable_all_gather_combine_by_dim=false
-                          --xla_gpu_enable_reduce_scatter_combine_by_dim=false
                           --xla_disable_hlo_passes=rematerialization}
 
                       export XLA_FLAGS="$BASE_XLA_FLAGS ${XLA_FLAGS:-}" 
 
@@ -51,7 +51,7 @@ spec:
                       # Zip the results of all the tests 
                       tar -czf test_logs.tar.gz /opt/output
                       # Upload logs to S3 bucket
-                      aws s3 cp /opt/output/summary.txt s3://jax-toolbox-eks-output/axlearn/${RUN_ID}/test_logs.tar.gz
+                      aws s3 cp test_logs.tar.gz s3://jax-toolbox-eks-output/axlearn/${RUN_ID}/test_logs.tar.gz
                   volumeMounts:
                     - name: output
                       mountPath: /opt/output
 
@@ -0,0 +1,40 @@
+# AXLearn
+[AXLearn](https://github.com/apple/axlearn) is a deep learning design framework, built on top of JAX and XLA, to support the development of large-scale models. 
+
+
+## Hardware and Software Specifications
+
+The functionality have been validated on AWS p5.48xlarge EKS cluster (8x H100 80G). 
+
+
+## Containers
+We provide a multi-architecture container that is regularly updated. Use these containers to avoid dependency and environment issues. 
+- Latest container: ghcr.io/nvidia/jax:axlearn
+- Nightly dated container: ghcr.io/nvidia/jax:axlearn-YYYY-MM-DD
+
+When you start an interactive session:
+
+- Navigate to `/opt/axlearn` inside the container.
+- Place your persistent files in a mounted directory (e.g. `/opt/axlearn/workspace`).
+
+## Launching a container
+Use the following command to launch a container:
+```bash
+docker run -ti --gpus=all --net=host --ipc=host -v <WORKSPACE_PATH>:/opt/axlearn/workspace -w /opt/axlearn <CONTAINER> /bin/bash
+```
+where `WORKSPACE_PATH` is the path to the directory where you would like to store any persistent files and `container` is the name of the maxtext container. You can additionally add dataset and vocab paths with the `-v` flag.
+
+## Example: training `fuji-3B-v3-flash-single-host` on EKS
+[Here is the YAML file](../../../.github/eks-workflow-files/axlearn/axlearn-fuji-model.yml) we're using for deploying the training of Fuji-3B model, that uses flash attention, and runs on a single host. The core part of the deployment is: 
+```bash 
+python3 -m axlearn.common.launch_trainer_main \
+        --module=text.gpt.c4_trainer \
+        --config=${CONFIG} \
+        --trainer_dir=${TRAINER_DIR} \
+        --data_dir=gs://axlearn-public/tensorflow_datasets \
+        --jax_backend=gpu             
+```
+Where `CONFIG="fuji-3B-v3-flash-single-host`. The input dataset is the public tensorflow [C4 dataset](https://www.tensorflow.org/datasets/catalog/c4). 
+
+## Testing
+[Here is the YAML file](../../../.github/eks-workflow-files/axlearn/axlearn-job.yml) used for testing AXLearn funcitonalities. In particular, this test makes uses of [`test_axlearn.sh` script](../../../.github/container/test-axlearn.sh). The test runs `pytest` against all the tests contains in `/opt/axlearn/axlearn/common` folder.