Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: initial analytics docs #40

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions apps/database-new/next-env.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />

// NOTE: This file should not be edited
// see https://nextjs.org/docs/basic-features/typescript for more information.
7 changes: 7 additions & 0 deletions apps/docs/app/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,13 @@ const products = [
description:
'Globally distributed, server-side functions to execute your code closest to your users for the lowest latency.',
},
{
title: 'Analytics',
icon: 'database',
hasLightIcon: true,
href: '/guides/warehouse',
description: 'Data analytics for ingesting and querying timeseries events.',
},
]

const migrationGuides = [
Expand Down
2 changes: 2 additions & 0 deletions apps/docs/components/Feedback/Feedback.utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ const getNotionTeam = (pathname: string) => {
return 'team-ai'
case 'cli':
return 'team-cli'
case 'warehouse':
return 'team-analytics'

// Ignoring platform for now because that section is a mix of teams.
case 'platform':
Expand Down
6 changes: 6 additions & 0 deletions apps/docs/components/Navigation/Navigation.commands.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ const navCommands = [
route: '/guides/realtime',
icon: () => <ArrowRight />,
},
{
id: 'nav-warehouse',
name: 'Go to Analytics',
route: '/guides/warehouse',
icon: () => <ArrowRight />,
},
{
id: 'nav-ai',
name: 'Go to AI & Vectors',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ function getMenuIcon(menuKey: string, width: number = 16, height: number = 16, c
return <IconMenuEdgeFunctions width={width} height={height} className={className} />
case 'realtime':
return <IconMenuRealtime width={width} height={height} className={className} />
case 'warehouse':
return <IconMenuDatabase width={width} height={height} className={className} />
case 'storage':
return <IconMenuStorage width={width} height={height} className={className} />
case 'ai':
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ export const GLOBAL_MENU_ITEMS: GlobalMenuItems = [
href: '/guides/realtime',
level: 'realtime',
},
{
label: 'Analytics',
icon: 'warehouse',
href: '/guides/warehouse',
level: 'warehouse',
},
{
label: 'AI & Vectors',
icon: 'ai',
Expand Down Expand Up @@ -1389,6 +1395,26 @@ export const realtime: NavMenuConstant = {
],
}

export const warehouse: NavMenuConstant = {
icon: 'warehouse',
title: 'Analytics',
url: '/guides/warehouse',
items: [
{
name: 'Overview & quickstart',
url: '/guides/warehouse',
},
{
name: 'Concepts',
url: undefined,
items: [
{ name: 'Ingestion', url: '/guides/warehouse/ingestion' },
{ name: 'BigQuery Backend', url: '/guides/warehouse/backends/bigquery' },
],
},
],
}

export const storage: NavMenuConstant = {
icon: 'storage',
title: 'Storage',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ enum MenuId {
Auth = 'auth',
Functions = 'functions',
Realtime = 'realtime',
Warehouse = 'warehouse',
Storage = 'storage',
Ai = 'ai',
Platform = 'platform',
Expand Down Expand Up @@ -85,6 +86,10 @@ const menus: Menu[] = [
id: MenuId.Realtime,
type: 'guide',
},
{
id: MenuId.Warehouse,
type: 'guide',
},
{
id: MenuId.Storage,
type: 'guide',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ export const getMenuId = (pathname: string | null) => {
return MenuId.Platform
case pathname.startsWith('realtime'):
return MenuId.Realtime
case pathname.startsWith('warehouse'):
return MenuId.Warehouse
case pathname.startsWith('resources'):
return MenuId.Resources
case pathname.startsWith('self-hosting'):
Expand Down
6 changes: 6 additions & 0 deletions apps/docs/content/guides/getting-started/features.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,12 @@ Execute an Edge Function in a region close to your database. [Docs](/docs/guides

Edge functions natively support NPM modules and Node built-in APIs. [Link](https://supabase.com/blog/edge-functions-node-npm).

## Analytics

### Event Ingestion

Ingest and query JSON timeseries data. [Docs](/docs/guides/warehouse/ingestion).

## Project management

### CLI
Expand Down
1 change: 1 addition & 0 deletions apps/docs/content/guides/platform.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Each project on Supabase comes with:
- [Edge Functions](/docs/guides/functions)
- [Realtime API](/docs/guides/realtime)
- [Storage](/docs/guides/storage)
- [Warehouse](/docs/guides/warehouse)

## Organizations

Expand Down
74 changes: 74 additions & 0 deletions apps/docs/content/guides/warehouse.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
id: 'Analytics'
title: 'Event Analytics'
description: 'Scalable data analytics'
subtitle: 'Scalable data analytics for observability, metrics, and more.'
hideToc: true
---

Supabase Analytics is an event ingestion and querying engine, that allows for storing, dispatching, and querying of events from one or more databases.

## Features

### Scalable Storage and Querying Costs

Columnar databases allows for fast analysis while providing compact storage mechanisms. The costs scale predictably according to the amount of data stored, allowing users to have a peace of mind when managing billing and infrastructure costs.

Lucene-based event management systems worked well before the advent of scalable database options, but starts to get prohibitively expensive beyond a certain scale and volume, and the data would need to be shipped elsewhere to be further analyzed over the long term.

Analytics connects to databases such as BigQuery to store massive volumes of data, while also providing tooling to abstract away the infrastructure intricacies of working with the underlying storage engine.

### Bring Your Own Backends

Analytics can integrate with your very own backends, with Analytics managing the ingestion pipeline and maximizing throughput of events. This ensures maximum flexibility for storing sensitive data.

Bringing-your-own-backend gives Supabase customers complete control over storage and querying costs.

### Schema Management

When events are ingested, the backend's schema is automatically managed by Analytics, allowing you to insert JSON payloads without having to worry about data type changes.

When new fields are sent to Analytics, the data type is detected automatically and merged into the current table schema.

## Quickstart

1. Create a collection

Head over to the [Logs & Analytics page](https://supabase.com/dashboard/project/_/logs/explorer).

Create a **New Collection** under the Analytics Events section.

2. Retrieve ingestion access token

Retrieve your public ingestion access token by clicking [Analytics Settings](https://supabase.com/dashboard/project/_/settings/Analytics) and clicking on the copy button.

3. Send an event

Execute this cURL command to send an event to Analytics.

Replace `YOUR-COLLECTION-NAME-HERE` and `YOUR-ACCESS-TOKEN-HERE` placeholders with the values from step 2.

```bash
# By collection name
curl -X "POST" "https://api.warehouse.tech/api/events/json?collection_name=YOUR-COLLLECTION-NAME-HERE" \
-H 'Content-Type: application/json; charset=utf-8' \
-H 'Authorization: Bearer YOUR-ACCESS-TOKEN-HERE' \
-d $'[{
"message": "This is the main event message",
"metadata": {"some": "log event"}
}]'


# By collection UUID
curl -X "POST" "https://api.warehouse.tech/api/events/json?collection=YOUR-COLLECTION-UUID-HERE" \
-H 'Content-Type: application/json; charset=utf-8' \
-H 'Authorization: Bearer YOUR-ACCESS-TOKEN-HERE' \
-d $'[{
"message": "This is the main event message",
"metadata": {"some": "log event"}
}]'
```

4. **Check the Collection**

You should see your new event pop up in the collection overview. You can then search and filter the collection for specific events using the filters, or use SQL to query the Analytics collections.
73 changes: 73 additions & 0 deletions apps/docs/content/guides/warehouse/backends/bigquery.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
id: 'warehouse-bigquery'
title: 'BigQuery'
description: 'Learn how to use the BigQuery Backend'
subtitle: 'Learn how to use the BigQuery Backend'
sidebar_label: 'BigQuery'
---

Analytics natively supports the storage of events to BigQuery. Ingested events are **streamed** into BigQuery, and each collection is mapped to a BigQuery table.

## Behavior and Configuration

On table initialization, Analytics sets optimal configuration automatically to ensure that tables are optimized.

### Ingestion

Ingested events are [streamed](https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery) into BigQuery. This allows us to maximize the throughput into BigQuery, allowing Analytics to handle large volumes of events.

### Partitioning and Retention

All tables are partitioned by the `timestamp` field, and are partitioned by **day**. This means that all queries across the BigQuery table must have a filter over the `timestamp` field.

A collection's retention will adjust the BigQuery table's partition expiry, such that data is automatically deleted once the partition expires.

For paid plans, if the retention is not set, the collection will default to **7 days** of retention on creation.

For users on the Free plan, the maximum retention is **3 days**.

#### Deep Dive: Table Partitioning

Table partitioning effectively splits a BigQuery table into many smaller tables.
When using partitioned tables, BigQuery storage is effectively half priced when the partitioned table is older than 90 days. When a table has not been modified in 90 days, BigQuery will only charge half the normal rate.

When partitioned over time (which Analytics manages automatically), we are able to benefit from the discount by separating out the older and less-queried smaller tables, reducing total effective storage costs.

Furthermore, by partitioning a table, we can then limit queries to scan data across only selected partitioned column/tables, making your queries even more responsive by scanning less data.

When querying against the streaming buffer, the amount of bytes scanned is always zero, allowing near zero-cost queries against it. Should you need to query against the streaming buffer directly, you can use the following query ([source](https://stackoverflow.com/questions/41864257/how-to-query-for-data-in-streaming-buffer-only-in-bigquery)) to do so:

```sql
select fields from `my_collection` where _PARTITIONTIME is null;
```

You can read more about partitioned tables in the official Google Cloud [documentation](https://cloud.google.com/bigquery/docs/partitioned-tables).

## Querying

When querying your collections, use BigQuery SQL syntax refer to each collection by name. Analytics will automatically parse and process the query to map it to the correct dataset and table name. You can perform joins across multiple collections as well.

### Unnesting Repeated Records

Nested columns are represeted as repeated `RECORD`s in BigQuery. To query inside a nested record you must UNNEST it like so:

```sql
select timestamp, req.url, h.cf_cache_status
from
`your_collection` as t
cross join UNNEST(t.metadata) as m
cross join UNNEST(m.request) as req
cross join UNNEST(m.response) as resp
cross join UNNEST(resp.headers) as h
where DATE(timestamp) = "2019-05-09"
order by timestamp desc
limit 10;
```

### Query Result Limit

There is a 1000 row result limit for each query run.

### SELECT only

Analytics only allows `SELECT` queries to be run. DDL statements will be blocked and will result in an error.
Loading