Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-google-sheets-native: Default first row as headers #2443

Open
jwhartley opened this issue Feb 25, 2025 · 2 comments
Open

source-google-sheets-native: Default first row as headers #2443

jwhartley opened this issue Feb 25, 2025 · 2 comments

Comments

@jwhartley
Copy link

jwhartley commented Feb 25, 2025

Summary
Update source-google-sheets-native to default the first row as headers

Problem statement
Users often don't use freeze the first row despite wanting the first row to be used as headers

User Stories
As a trial user, I want to be able to import sheets smoothly so that I can rapidly prototype and be confident Estuary just works

Priority + Impact
Priority Low
Impact Low
Customers have run into this, which is annoying for them (have seen a couple of questions in the last couple months)
Users google sheets as a first capture is relatively common, and this not working as expected will turn some small number of them off

Context: Data, Slack Threads
Full disclosure: I'm ex-google so have a skewed view of how normies use gsheets
Current behaviour is on purpose because "users would use sheets containing only data, where the first data row was interpreted as header"
As discussed May 2024

Examples of customers running into this today Feb 25, Feb 7, Jan 18

Analysis of existing captures/collections (sheet):
I looked at all the source-google-sheets-native captures/collections I could find with data in paying customers, and found:
Out of 21 collections (across 10 tenants):

  • 12 had frozen headers (most people get this right)
  • 3 had frozen headers after initially not (some people get this right after missing it initially)
  • 6 had no frozen headers
  • 1 had no headers in the data (one capture did not want first row headers*)
  • 1 had headers in the data (didn't get headers when they should've)
  • 4 had no data to investigate (can't tell what they wanted)
    I interpret this as: most collections want headers from the first row by default, and a small minority don't want headers from the first row by default

Proposed Solution

  1. Default to using the first row as headers, without requiring the frozen row
  2. For customers who want to keep lettered columns, provide a capture config option: 'Use letters for column headers instead of first row'

Alternatives considered
We could skip #2 if it's hard to implement, and force this minority of users to create a header themselves (potentially annoying if they don't have write access to the sheet and aren't familiar with =importrange/gsheets)

@dyaffe
Copy link
Member

dyaffe commented Feb 25, 2025

Agreed that this would be a nice improvement. I'm not sure if anyone realistically wants lettered columns. Would just note explicitly that what we should likely do is:

  1. Look for a frozen header and use it
  2. If it doesn't exist, use the first row

Does that sound right @jwhartley

@jwhartley
Copy link
Author

jwhartley commented Feb 25, 2025

@dyaffe You are saying that if there are >1 frozen rows, pick the last frozen row as the header row?
Note today that doesn't work (I had a sheet with 2 frozen rows, and it didn't pick up the second as the header)

So the header priority would be:

  1. 'Use letters for column headers instead of first row' advanced toggle if checked
  2. Last frozen row if there's frozen rows
  3. First row otherwise (most common)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants