Skip to content

Latest commit

 

History

History
281 lines (193 loc) · 13.3 KB

http.markdown

File metadata and controls

281 lines (193 loc) · 13.3 KB

HTTP

Caching

HTTP Caching - Google Developers

Caching headers

Caching headers:

  • Cache-Control

    • specifies caching policy: who can cache (browser or intermediate caches), under which conditions and for how long;
    • defined in HTTP/1.1 and supersedes previous headers (e.g. Expires);
  • ETag

    • used for cache validation;
    • generated by the server, is typically a hash or fingerprint of the resource;

Cache Validating

Cache validation

  • Whe the same resource is requested again, the browser has it in its cache, but expired;
  • Then the browser sends a request with the ETag value in In-None-Match header;
  • The server checks whether the token has changed for the requested resource, if not, then it doesn't return the resource, just return 304 Not Modified, the browser can then reuse the cached resource and update the cache time;

Cache-Control

Values for Cache-Control

  • no-store

    • disallow the browser and all intermediate caches from storing the response, should always be downloaded in full from the server;
    • suitable for private personal or banking data;
  • no-cache

    • response can be stored, but any subsequent request need to check the server if the resource has changed;
  • public vs. private

    • public means the response can be cached, it's not necessary most of the time, it's implicit if you use max-age;
    • private indicates a browser can cache the response, but not intermediate caches, e.g. if a HTML page contains private user information, it can be cached by a browser, but a CDN should not cache it;
  • max-age

    • indicates for how many seconds the resource is allowed to be reused, e.g. max-age=60;
  • s-maxage: used by CDNs and other intermediate caches, overrides max-age and Expires;

Define cache policy

You can use the above diagram to define the optimal cache policy for your resources, here are some examples:

  • max-age=86400 can be cached by browser and any intermediate caches for up to 1 day;
  • private, max-age=600 can be cached by the browser for up to 10 minutes;
  • no-store disallow any cache;

Invalidating and updating cached responses

  • Local caches are used until the resource 'expires';
  • Embedding a file content fingerprint in the URL enables you to force the client to update to a new version of the response;

Cache hierarchy

  • The HTML is marked with no-cache, so the browser always revalidate it on every request. Since the markup contains static resource urls with fingerprint, so whenever one of those static resources updates, the HTML file changes, it will be downloaded again; A 304 status code is returned if nothing changed;
  • The CSS is allowed to be cached by the browser and intermediate caches for 1 year, whenever it updates, the fingerprint will change, which triggers the HTML change, so it will be re-downloaded;
  • The JS file is similar to the CSS file, however it is marked as private, can only be cached by the browser;

Heuristic freshness

If a response doesn't set Cache-Control: max-age or Expires, HTTP still allows it to be cached using what’s called heuristic freshness, often based on Last-Modified header, see https://webmasters.stackexchange.com/a/111299;

Looks like this behaviour is not consistent across browsers (Firefox will always refetch the resource, Chrome may just load it from the cache)

Caching checklist

  • Use consistent URLs, if the same content is served on different URLs, they are treated as different resources by the browser, and get fetched multiple times;
  • Ensure the server provides a validation token (ETag)
  • Identify which resources can be cached by intermediate caches, responses that are identical for all users are great candidates to be cached by a CDN;
  • Determine the optimal cache lifetime to each resource: set a proper max-age for different resources;
  • Determine the best cache hierarchy: setting a short or no-cache lifetime for a containing HTML document(or an entry JS file which loads other JS files) and a long lifetime for contained resources with fingerprinted URLs is usually a good strategy;
  • Splitting code: split out the frequently updated part of your code to separate files, this allows the remainder of the content (e.g. library code that doesn't change very often) to be fetched from cache;

Nginx server configs

  • Nginx uses etag on by default, which generates the ETag response header (http://nginx.org/en/docs/http/ngx_http_core_module.html#etag);

  • expires directive sets Expires and Cache-Control headers(max-age is calculated based on the time specified), see http://nginx.org/en/docs/http/ngx_http_headers_module.html

    expires    off;                       # default value
    expires    24h;                       # expires in 24 hours from now
    expires    modified +24h;             # relative to modified time
    expires    @24h;                      # specify a time of day
    expires    0;
    expires    -1;
    expires    epoch;                     # 'Thu, 01 Jan 1970 00:00:01 GMT', no-cache
    add_header Cache-Control private;     # add Cache-Control directly

    You can also set it based on MIME type of resources:

    map $sent_http_content_type $expires {
      default                 off;
      text/html               epoch;
      text/css                max;
      application/javascript  max;
      application/pdf         30d;
      ~image/                 max;  # regexp matching all image MIME types
    }
    
    server {
      listen 80 default_server;
      listen [::]:80 default_server;
    
      expires $expires;

HTTPS

Get a certificate

  1. A company/organization/individual creates a certificate signing request file (.csr, which contains a public key) and a private key file (.key), sends the .csr to a CA;

  2. If the CA approves, it issues a certificate, the format of a certificate is like: x.509 certificate

    It contains the public key and a signature which is signed using the CA's private key.

Install

For NginX, you need to put both of the certificate and the private key file's path in the server config.

ssl_certificate file;
ssl_certificate_key file;

the server's cert and any intermediate CA's cert should be put into the same file like

-----BEGIN CERTIFICATE-----
// server cert
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
// intermediate CA cert
-----END CERTIFICATE-----

How does it work

HTTP encryption process

  • A pair of public/private keys can be used for two purposes:
    • In encryption, public key is used for encrypting a message and private key for decrypting, so only the owner of the private key can get the message;
    • In authentication, private key is used for signing a message and public key can be used for verifying, so everyone can verify a message is authentic;
    • Different algorithms are used for encryption and signing, and generally you should use a separate pair of keys for each;
  • In step 3, client verifies the server's certificate against its pre-loaded Root CA certs;
    • If the server's certificate claims it's from CA Foo, the client can verify the certificate with CA Foo's public key, which is in Foo's certificate;
    • If the server's certificate is issued by an intermediate CA, the intermediate CA's cert should come with the server's certificate, and the client verifies them all the way up to a root CA;
  • Start from step 6, client and server only use the symmetric key for encryption;

SNI

  • Server Name Indication (SNI) is an extension to the Transport Layer Security (TLS)
  • A client indicates which hostname it is attempting to connect to at the start of the handshaking process.
  • This allows a server to present multiple certificates on the same IP address and TCP port number and hence allows multiple secure (HTTPS) websites (or any other service over TLS) to be served by the same IP address without requiring all those sites to use the same certificate.
  • It is the conceptual equivalent to HTTP/1.1 name-based virtual hosting, but for HTTPS.
  • This also allows a proxy to forward client traffic to the right server during TLS/SSL handshake.
  • The desired hostname is NOT encrypted in the original SNI extension, so an eavesdropper can see which site is being requested. And firewalls uses this to enforce FQDN based rules.

Details:

Name-based virtual hosting allows multiple DNS hostnames to be hosted by a single server (usually a web server) on the same IP address. To achieve this, the server uses a hostname presented by the client as part of the protocol (for HTTP the name is presented in the Host header). However, when using HTTPS, the TLS handshake happens before the server sees any HTTP headers. Therefore, it was not possible for the server to use the information in the HTTP host header to decide which certificate to present and as such only names covered by the same certificate could be served from the same IP address.

CORS

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Cross-Origin Resource Sharing is a mechanism browsers use to control whether an application in one origin can access resources from a different origin.

CORS is used in cross-origin requests for:

  • APIs such as XMLHttpRequest or Fetch;
  • Web Fonts (@font-face in CSS);
  • Other uncommon scenarios;

Overview:

  • The server uses some HTTP headers in the response to let the browser know which origins are permitted;
  • In some cases, a "preflight" OPTIONS request is needed, upon "approval", the browser sends the actual request;
  • CORS failure specifics are not available to JS, you need to look at the browser's console for details;

Simple requests

A simple request needs to meet all the following conditions:

  • Method must be one of: GET, HEAD and POST;
  • Headers must be:
    • Set automatically by the user agent: Connection, User-Agent, etc;
    • Or in the CORS safe-list, such as: Accept, Accept-Language, Content-Language, Content-Type, etc;
  • Content-Type value must be one of:
    • application/x-www-form-urlencoded
    • multipart/form-data
    • text/plain
  • No event listeners are registered on any XMLHttpRequestUpload object used in the request, these are accessed using the XMLHttpRequest.upload property;
  • No ReadableStream object is used in the request;

Simple CORS request

  • In request, there is a Origin header indicating where the request comes from;
  • In response, there is a header Access-Control-Allow-Origin: *, which allows any origin to access the resource;

Preflighted requests

Preflighted CORS request

  • The browser sees a non-simple CORS request, it sends an OPTIONS preflight request, with the info about the real request;
  • The server responds with what is allowed, and how long the preflight response can be cached;
  • Then the browser sends the real request;

Requests with credentials

By default, browsers will not send cookies/HTTP Authentication info with cross-site XMLHttpRequest or Fetch;

You need to set a specific flag:

// XHR
const invocation = new XMLHttpRequest();
invocation.open('GET', 'http://bar.other/resources', true);
invocation.withCredentials = true;

// fetch
fetch(url, {
  credentials: 'include',
  ...
})

And the browsers only accept the response when BOTH the preflight and the actual request have

  • an Access-Control-Allow-Credentials: true header;
  • and the Access-Control-Allow-Origin is exactly the current origin, not a wildcard *;

Cross site script

By default, you can include a script from another origin:

<script src="https://another.site/lib.js"></script>

But if any error happens in this script, you can't get details from the window.onerror handler (many bug tracking services rely on it).

To allow the access, <script> tag needs a crossorigin attribute, and remote server must provide special headers:

  • crossorigin="anonymous", this is the same as just crossorigin, access allowed if the server responds with the header Access-Control-Allow-Origin with * or our origin. Browser does not send authorization information and cookies to remote server;
  • crossorigin="use-credentials", access allowed if the server sends back the header Access-Control-Allow-Origin with our origin and Access-Control-Allow-Credentials: true. Browser sends authorization information and cookies to remote server;