Skip to content

Commit

Permalink
Miscellaneous operations doc fixes (#5598)
Browse files Browse the repository at this point in the history
  • Loading branch information
jumaffre authored Aug 31, 2023
1 parent d6f54e0 commit 1454542
Show file tree
Hide file tree
Showing 8 changed files with 38 additions and 34 deletions.
25 changes: 12 additions & 13 deletions doc/operations/certificates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Certificates

This page describes how the validity period of node and service certificates can be set by operators, and renewed by members.

.. note:: The granularity for the validity period of nodes and service certificates is one day.
.. note:: The granularity of the validity period of nodes and service certificates is one day.

.. tip:: See :ref:`architecture/cryptography:Identity Keys and Certificates` for a detailed explanation of the relationship between keys and certificates in CCF.

Expand All @@ -16,7 +16,7 @@ The ``command.start.service_configuration.maximum_node_certificate_validity_days

Service-endorsed and self-signed node certificates are set with identical validity periods throughout the lifetime of a node. These certificates are presented to clients by ``Service`` and ``Node`` endorsed RPC interfaces, respectively (see ``rpc_interfaces.endorsement`` :doc:`configuration entry </operations/configuration>`).

.. tip:: Once a node certificate has expired, clients will no longer trust the node serving their request. It is expected that operators and members will monitor the certificate validity dates with regards to current time and renew the node certificate before expiration. See :ref:`governance/common_member_operations:Renewing Node Certificate` for more details.
.. tip:: Once a node certificate has expired, clients will no longer trust the node serving their request. It is expected that operators and members will monitor the certificate validity dates with regard to current time and renew the node certificate before expiration. See :ref:`governance/common_member_operations:Renewing Node Certificate` for more details.

The procedure that operators and members should follow is summarised in the following example. A 3-node service is started by operators and the initial certificate validity period is set by ``node_certificate.initial_validity_days`` (grey). Before these certificates expire, the service is open by members who renew the certificate for each node, via the ``set_all_nodes_certificate_validity`` proposal action, either standalone or bundled with the existing ``transition_service_to_open`` action. When a new node (3) joins the service, members should set the validity period for its certificate when submitting the ``transition_node_to_trusted`` proposal. Finally, operators and members should issue a new proposal to renew soon-to-expire node certificates (red).

Expand Down Expand Up @@ -87,12 +87,11 @@ Unendorsed, self-signed (CA) service certificates are a complication for clients

1. Get a globally reachable DNS name for your CCF network, e.g. ``my-ccf.example.com``, which resolves to the address of at least one node in the network. Multiple nodes or a load balancer address are fine too.

2. ACME `http-01 <https://letsencrypt.org/docs/challenge-types/>`_ challenges require a challenge server to be reachable on port 80 (non-negotiable).
To be able to bind to that port, the ``cchost`` binary may need to be given special permission, e.g. by running ``sudo setcap CAP_NET_BIND_SERVICE=+eip cchost``. Alternatively, port 80 can be redirected to a non-privileged port that ``cchost`` may bind to without special permission.
2. ACME `http-01 <https://letsencrypt.org/docs/challenge-types/>`_ challenges require a challenge server to be reachable on port 80 (non-negotiable). To be able to bind to that port, the ``cchost`` binary may need to be given special permission, e.g. by running ``sudo setcap CAP_NET_BIND_SERVICE=+eip cchost``. Alternatively, port 80 can be redirected to a non-privileged port that ``cchost`` may bind to without special permission.

3. Each interface defined in the ``cchost`` configuration file can be given the name of an ACME configuration to use. The settings of each ACME configuration are defined in ``network.acme`` :doc:`configuration entry </operations/configuration>`. Note that this information is required by *all* nodes as they might have to renew the certificate(s) later. Further, an additional interface for the challenge server is required.

The various options are as follows:
The various options are as follows:

.. code-block:: python
Expand Down Expand Up @@ -132,13 +131,13 @@ To be able to bind to that port, the ``cchost`` binary may need to be given spec
}
- ``ca_certs``: CCF will need to establish https connections with the CA, but does not come with root certificates by default and therefore will fail to establish connections. This setting is populated with one or more such certificates; e.g. for Let's Encrypt this would be their ISRG Root X1 certificate (see `here <https://letsencrypt.org/certificates/>`_) in PEM format.
- ``directory_url``: This is the main entry point for the ACME protocol. For Let's Encrypt's `staging environment <https://letsencrypt.org/docs/staging-environment/>`_, this is ``https://acme-staging-v02.api.letsencrypt.org/directory``; minus the ``-staging`` for their production environment).
- ``service_dns_name``: The DNS name for the network from step 1.
- ``alternative_names``: Alternative names for the service we represent (X509 SANs).
- ``contact``: A list of contact addresses, usually e-mail addresses, which must be prefixed with ``mailto:``. These contacts may receive notifications about service changes, e.g. certificate revocation or expiry.
- ``terms_of_service_agreed``: A Boolean confirming that the operator accepts the terms of service for the CA. RFC8555 requires this to be set explicitly by the operator.
- ``challenge_type``: Currently only `http-01 <https://letsencrypt.org/docs/challenge-types/>`_ is supported.
- ``challenge_server_interface``: Name of the interface that the ACME challenge server listens on. For http-01 challenges in production, this interface must be exposed publicly on port 80.
- ``ca_certs``: CCF will need to establish HTTPS connections with the CA, but does not come with root certificates by default and therefore will fail to establish connections. This setting is populated with one or more such certificates; e.g. for Let's Encrypt this would be their ISRG Root X1 certificate (see `here <https://letsencrypt.org/certificates/>`_) in PEM format.
- ``directory_url``: This is the main entry point for the ACME protocol. For Let's Encrypt's `staging environment <https://letsencrypt.org/docs/staging-environment/>`_, this is ``https://acme-staging-v02.api.letsencrypt.org/directory``; minus the ``-staging`` for their production environment).
- ``service_dns_name``: The DNS name for the network from step 1.
- ``alternative_names``: Alternative names for the service we represent (X509 SANs).
- ``contact``: A list of contact addresses, usually e-mail addresses, which must be prefixed with ``mailto:``. These contacts may receive notifications about service changes, e.g. certificate revocation or expiry.
- ``terms_of_service_agreed``: A Boolean confirming that the operator accepts the terms of service for the CA. RFC8555 requires this to be set explicitly by the operator.
- ``challenge_type``: Currently only `http-01 <https://letsencrypt.org/docs/challenge-types/>`_ is supported.
- ``challenge_server_interface``: Name of the interface that the ACME challenge server listens on. For http-01 challenges in production, this interface must be exposed publicly on port 80.

4. CCF nodes periodically check for certificate expiry and trigger renewal when 66% of the validity period has elapsed. The resulting certificates are stored in the ``ccf.gov.service.acme_certificates`` table and upon an update to this table, nodes will automatically install the corresponding certificate on their interfaces. If necessary, renewal can also be triggered manually by submitting a ``trigger_acme_refresh`` governance proposal.
2 changes: 1 addition & 1 deletion doc/operations/code_upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Procedure

5. If necessary, the constitution scripts and JavaScript/TypeScript application bundles should be updated via governance:

- Members should be use the ``set_constitution`` proposal action to update the constitution scripts.
- Members should use the ``set_constitution`` proposal action to update the constitution scripts.
- See :ref:`bundle deployment procedure <build_apps/js_app_bundle:Deployment>` to update the JavaScript/TypeScript application.

6. Finally, once the code upgrade process has been successful, the old code version (i.e. the code version run by nodes 0, 1 and 2) can be removed using the ``remove_node_code`` or ``remove_snp_host_data`` proposal actions.
Expand Down
17 changes: 12 additions & 5 deletions doc/operations/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,14 @@ This section describes how :term:`Operators` manage the different nodes constitu
:fa:`upload` :doc:`ledger_snapshot`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Provision a new execution node for an existing service quickly from a state snapshot.
Understand how to backup ledger files and provision new nodes from a state snapshot.

---

:fa:`database` :doc:`data_persistence`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Best practices and durability guarantees for ledger and snapshot files.

---

Expand Down Expand Up @@ -62,14 +69,14 @@ This section describes how :term:`Operators` manage the different nodes constitu
---

:fa:`laptop-code` :doc:`platforms/index`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Platforms supported by CCF.
Platforms and trusted execution environments supported by CCF.

---

:fa:`wrench` :doc:`troubleshooting`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Troubleshooting tips for unexpected events.

Expand All @@ -94,10 +101,10 @@ This section describes how :term:`Operators` manage the different nodes constitu
start_network
configuration
ledger_snapshot
data_persistence
code_upgrade
certificates
recovery
data_persistence
network
platforms/index
troubleshooting
Expand Down
4 changes: 2 additions & 2 deletions doc/operations/ledger_snapshot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Snapshots are generated at regular intervals by the current primary node and sto

.. note:: Because the generation of a snapshot requires a new ledger chunk to be created (see :ref:`operations/ledger_snapshot:File Layout`), all nodes in the network must be started with the same ``snapshots.tx_count`` value.

To guarantee that the identity of the primary node that generated the snapshot can be verified offline, the SHA-256 digest of the snapshot (i.e. evidence) is recorded in the ``public:ccf.internal.snapshot_evidence`` table. The snapshot evidence will be signed by the primary node on the next signature transaction (see :ref:`operations/configuration:``ledger_signatures```).
To guarantee that the identity of the primary node that generated the snapshot can be verified offline, the SHA-256 digest of the snapshot (i.e. evidence) is recorded in the :ref:`audit/builtin_maps:``snapshot_evidence``` table. The snapshot evidence will be signed by the primary node on the next signature transaction (see :ref:`operations/configuration:``ledger_signatures```).

Committed snapshot files are named ``snapshot_<seqno>_<evidence_seqno>.committed``, with ``<seqno>`` the sequence number of the state of the key-value store at which they were generated and ``<evidence_seqno>`` the sequence number at which the snapshot evidence was recorded.

Expand All @@ -70,7 +70,7 @@ Join or Recover From Snapshot

Once a snapshot has been generated by the primary, operators can copy or mount the `read-only` snapshot directory to the new node directory before it is started. On start-up, the new node will automatically resume from the latest committed snapshot file in the ``snapshots.directory`` directory. If no snapshot file is found, all historical transactions will be replicated to that node.

It is important to note that new nodes cannot resume from a snapshot and join a service via a node that started from a more recent snapshot. For example, if a new node resumes from a snapshot generated at ``seqno 100`` and joins from a (primary) node that originally resumed from a snapshot at ``seqno 50``, the new node will throw a ``StartupSeqnoIsOld`` error shortly after starting up. It is expected that operators copy the *latest* committed snapshot file to new nodes before start up.
It is important to note that new nodes cannot join a service if the snapshot they start from is older than the snapshot the primary node started from. For example, if a new node resumes from a snapshot generated at ``seqno 50`` and joins from a (primary) node that originally resumed from a snapshot at ``seqno 100``, the new node will throw a ``StartupSeqnoIsOld`` error shortly after starting up. It is expected that operators copy the *latest* committed snapshot file to new nodes before start up.

Historical Transactions
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
5 changes: 3 additions & 2 deletions doc/operations/resource_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ Resource Usage
CPU
---

A CCF application currently consists of two threads. A host thread manages sockets and files, and handles communication with the enclave via ringbuffers.
An enclave thread contains the TLS termination, all cryptography, and the application and key value code. It communicates with the host via ringbuffers too.
A single CCF node process runs at least two threads. A host thread manages sockets and files, and handles communication with the enclave via ring-buffers.
An enclave thread contains the TLS termination, all cryptography, and the application and key value code. It communicates with the host via ring-buffers too.
It is possible to add additional worker threads inside the enclave via the :ref:`operations/configuration:``worker_threads``` configuration entry.

Memory
------
Expand Down
4 changes: 2 additions & 2 deletions doc/operations/run_setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The runtime images do not contain any particular CCF application, and may be hel
C++ Apps
~~~~~~~~

The ``mcr.microsoft.com/ccf/app/run`` container can be run to deploy C++ apps. It contains the ``cchost`` binary and the dependencies required to spin up a CCF node.
The `mcr.microsoft.com/ccf/app/run <https://mcr.microsoft.com/en-us/product/ccf/app/run>`_ container can be run to deploy C++ apps. It contains the ``cchost`` binary and the dependencies required to spin up a CCF node.

.. tab:: SGX

Expand All @@ -69,7 +69,7 @@ The ``mcr.microsoft.com/ccf/app/run`` container can be run to deploy C++ apps. I
JavaScript/TypeScript Apps
~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``mcr.microsoft.com/ccf/app/run`` container can be run to deploy JavaScript/TypeScripts apps. It contains the ``cchost`` binary, the ``libjs_generic`` native application to run JavaScript/TypeScript apps, and the dependencies required to spin up a CCF node.
The `mcr.microsoft.com/ccf/app/run-js <https://mcr.microsoft.com/en-us/product/ccf/app/run-js>`_ container can be run to deploy JavaScript/TypeScripts apps. It contains the ``cchost`` binary, the ``libjs_generic`` native application to run JavaScript/TypeScript apps, and the dependencies required to spin up a CCF node.

.. tab:: SGX

Expand Down
2 changes: 1 addition & 1 deletion doc/operations/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,6 @@ Info Messages

Signal 13 (`SIGPIPE`) is emitted on writes to closed fds. It is superfluous in programs that handle write errors, such as CCF, and is therefore ignored. This message does not indicate a malfunction.

Most CCF releases set the `SIG_IGN` handler, but a bug introduced in Open Enclave `0.18.0 <https://github.com/openenclave/openenclave/releases/tag/v0.18.0>` caused the process to crash rather than ignore the signal. CCF installed an alternative handler as a workaround in `2.0.2 <https://github.com/microsoft/CCF/releases/tag/ccf-2.0.2>`_ , which produces this log line.
Most CCF releases set the `SIG_IGN` handler, but a bug introduced in Open Enclave `0.18.0 <https://github.com/openenclave/openenclave/releases/tag/v0.18.0>`_ caused the process to crash rather than ignore the signal. CCF installed an alternative handler as a workaround in `2.0.2 <https://github.com/microsoft/CCF/releases/tag/ccf-2.0.2>`_ , which produces this log line.

The issue was fixed upstream in Open Enclave `0.18.1 <https://github.com/openenclave/openenclave/releases/tag/v0.18.1>`_ (see `#4542 <https://github.com/openenclave/openenclave/issues/4542>`_). This log line is now redundant and will be removed from later releases.
13 changes: 5 additions & 8 deletions livehtml.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,11 @@

set -e

echo "Generate version.py if it doesn't already exist"
if [ ! -f "python/version.py" ]
then
mkdir -p tmp_build
cd tmp_build
cmake -L -GNinja -DCOMPILE_TARGET=virtual ..
cd ..
rm -rf tmp_build
if [ ! -f "python/version.py" ]; then
echo "Generate version.py if it doesn't already exist"
mkdir -p tmp_build && cd tmp_build
cmake -L -GNinja -DCOMPILE_TARGET=virtual ..
cd .. && rm -rf tmp_build
fi

echo "Setting up Python environment..."
Expand Down

0 comments on commit 1454542

Please sign in to comment.