Skip to content

Commit f65a5f6

Browse files
Add docs to achieve 200K scale. (#7812)
* Add docs to achieve 200K scale. * Addressed review comments --------- Co-authored-by: Mattermost Build <[email protected]>
1 parent 80bdd2a commit f65a5f6

File tree

2 files changed

+69
-2
lines changed

2 files changed

+69
-2
lines changed
+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
Scale Mattermost up to 200000 users
2+
====================================
3+
4+
.. include:: ../_static/badges/ent-selfhosted.rst
5+
:start-after: :nosearch:
6+
7+
This page describes the Mattermost reference architecture designed for the load of up to 200000 concurrent users. Unsure which reference architecture to use? See the :doc:`scaling for enterprise </scale/scaling-for-enterprise>` documentation for details.
8+
9+
- **High Availability**: Required
10+
- **Database Configuration**: writer, multiple readers
11+
12+
.. note::
13+
- Usage of CPU, RAM, and storage space can vary significantly based on user behavior. These hardware recommendations are based on traditional deployments and may grow or shrink depending on how active your users are.
14+
- From Mattermost v10.4, Mattermost Enterprise customers can configure `Redis <https://redis.io/>`_ (Remote Dictionary Server) as an alternative cache backend. Using Redis can help ensure that Mattermost remains performant and efficient, even under heavy usage. See the :ref:`Redis cache backend <configure/environment-configuration-settings:redis cache backend>` configuration settings documentation for details.
15+
- While the following Elasticsearch specifications may be more than sufficient for some use cases, we have not extensively tested configurations with lower resource allocations for this user scale. If cost optimization is a priority, admins may choose to experiment with smaller configurations, but we recommend starting with the tested specifications to ensure system stability and performance. Keep in mind that under-provisioning can lead to degraded user experience and additional troubleshooting effort.
16+
17+
Requirements
18+
------------
19+
20+
+------------------------+-----------+----------------+-----------------------+
21+
| **Resource Type** | **Nodes** | **vCPU/ | **AWS Instance** |
22+
| | | Memory (GiB)** | |
23+
+========================+===========+================+=======================+
24+
| Mattermost Application | 14 | 16/32 | c7i.4xlarge |
25+
+------------------------+-----------+----------------+-----------------------+
26+
| RDS Writer | 1 | 16/128 | db.r7g.4xlarge |
27+
+------------------------+-----------+----------------+-----------------------+
28+
| RDS Reader | 6 | 16/128 | db.r7g.4xlarge |
29+
+------------------------+-----------+----------------+-----------------------+
30+
| Elasticsearch cluster | 4 | 8/64 | r6g.2xlarge.search |
31+
+------------------------+-----------+----------------+-----------------------+
32+
| Proxy | 4 | 32/128 | m6in.8xlarge |
33+
+------------------------+-----------+----------------+-----------------------+
34+
| Redis | 1 | 8/32 | cache.m7g.2xlarge |
35+
+------------------------+-----------+----------------+-----------------------+
36+
37+
Lifetime storage
38+
----------------
39+
40+
.. include:: ../scale/lifetime-storage.rst
41+
:start-after: :nosearch:
42+
43+
Estimated storage per user, per month
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45+
46+
.. include:: ../scale/estimated-storage-per-user-per-month.rst
47+
:start-after: :nosearch:
48+
49+
Example
50+
~~~~~~~
51+
52+
A 200000-person team with medium usage (with a safety factor of 2x) would require between 21.12TB :sup:`1` and 105.6TB :sup:`2` of free space per annum.
53+
54+
:sup:`1` 200000 users * 5 MB * 12 months * 2x safety factor
55+
56+
:sup:`2` 200000 users * 25 MB * 12 months * 2x safety factor
57+
58+
We strongly recommend that you review storage utilization at least quarterly to ensure adequate free space is available.
59+
60+
Additional considerations
61+
-------------------------
62+
63+
.. include:: ../scale/additional-ha-considerations.rst
64+
:start-after: :nosearch:

source/scale/scaling-for-enterprise.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ The following reference architectures are available as recommended starting poin
2222
* :doc:`Scale up to 80000 users </scale/scale-to-80000-users>` - Learn how to scale Mattermost to up to 80000 users.
2323
* :doc:`Scale up to 90000 users </scale/scale-to-90000-users>` - Learn how to scale Mattermost to up to 90000 users.
2424
* :doc:`Scale up to 100000 users </scale/scale-to-100000-users>` - Learn how to scale Mattermost to up to 100000 users.
25+
* :doc:`Scale up to 200000 users </scale/scale-to-200000-users>` - Learn how to scale Mattermost to up to 200000 users.
2526

2627
.. important::
2728

@@ -32,10 +33,12 @@ Testing methodology and updates
3233

3334
All tests were executed with the custom load test tool built by the Mattermost development teams to determine supported users for each deployment size. Over time, this guide will be updated with new deployment sizes, deployment architectures, and newer versions of the Mattermost Server will be tested using an ESR.
3435

35-
At a high level, each deployment size was fixed (Mattermost server node count/sizing, database reader/writer count/sizing), and unbounded tests were used to report the maximum numbers of concurrent users the deployment can support. Each test included populated PostgreSQL v14 databases and a post table history of 100 million posts, ~3000 users, 20 teams, and ~720000 channels to provide a test simulation of a production Mattermost deployment.
36+
At a high level, each deployment size was fixed (Mattermost server node count/sizing, database reader/writer count/sizing), and unbounded tests were used to report the maximum numbers of concurrent users the deployment can support. Each test included populated PostgreSQL v14 databases and a post table history of 100 million posts, ~200000 users, 20 teams, and ~720000 channels to provide a test simulation of a production Mattermost deployment.
3637

3738
Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v9.5 Extended Support Release (ESR). Job servers weren't used. All tests with more than a single app node had an NGINX proxy running in front of them.
3839

40+
For the last test of 200K users, further infrastructure changes were made. Elasticsearch nodes were added. A Redis instance was added, and multiple NGINX proxies were used to distribute traffic evenly across all nodes in the cluster. More details can be found in the `page </scale/scale-to-200000-users>`.
41+
3942
Full testing methodology, configuration, and setup is available, incluidng a `fixed database dump with 100 million posts <https://us-east-1.console.aws.amazon.com/backup/home?region=us-east-1#/resources/arn%3Aaws%3Ards%3Aus-east-1%3A729462591288%3Acluster%3Adb-pg-100m-posts-v9-5-5>`_. Visit the `Mattermost Community <https://community.mattermost.com/>`_ and join the `Developers: Performance channel <https://community.mattermost.com/core/channels/developers-performance>`_ for details.
4043

4144
Mattermost load testing tools
@@ -49,4 +52,4 @@ Visit the `Mattermost Load Test Tool <https://github.com/mattermost/mattermost-l
4952

5053
- The Mattermost Load Test Tool was designed by and is used by our performance engineers to compare and benchmark the performance of the service from month to month to prepare for new releases. It's also used extensively in developing our recommended hardware sizing.
5154
- We recommend deploying :doc:`Prometheus and Grafana </scale/deploy-prometheus-grafana-for-performance-monitoring>` with our :ref:`dashboards <scale/deploy-prometheus-grafana-for-performance-monitoring:getting started>` for ongoing monitoring and scale guidance.
52-
- If you encounter performance concerns, we recommend :doc:`collecting performance metrics </scale/collect-performance-metrics>` and sharing them with us as a first troubleshooting step.
55+
- If you encounter performance concerns, we recommend :doc:`collecting performance metrics </scale/collect-performance-metrics>` and sharing them with us as a first troubleshooting step.

0 commit comments

Comments
 (0)