First pass metrics logging #372

kparlante · 2013-12-04T23:33:53Z

@dannycoates : Related to #349 and #351, first set of events to log for end to end testing.

Strawman proposal, written as json:

Logged at the start of: https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#post-v1accountcreate, https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#post-v1raw_passwordaccountcreate

"event": {
    "eventName": account_create_started,
    "timestamp": 1373575200000,
    "startOffset":4777,
    "service":"FTU",
    "flow":"CreateAccount",
    "device":"FxOS 18",
    "locale":en
  }

Logged at the point the account is successfully created:

"event": {
    "eventName": account_created,
    "timestamp": 1373575200000,
    "startOffset":4777,
    "service":"FTU",
    "flow":"CreateAccount",
    "device":"FxOS 18",
    "locale":en
  }

Logged at the start of https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#post-v1sessioncreate, https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#post-v1raw_passwordsessioncreate; flow could be either "CreateAccount" or "AttachDevice"

"event": {
    "eventName": device_attach_started,
    "timestamp": 1373575200000,
    "startOffset":4777,
    "service":"FTU",
    "flow":"CreateAccount",
    "device":"FxOS 18",
    "locale":en
  }

Logged at the point the device is successfully attached, flow could be either "CreateAccount" or "AttachDevice":

"event": {
    "eventName": device_attached,
    "timestamp": 1373575200000,
    "startOffset":4777,
    "service":"FTU",
    "flow":"CreateAccount",
    "device":"FxOS 18",
    "locale":en
  }

Logged at the sucessful completion of https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#post-v1certificatesign

"event": {
    "eventName": certificate_signed,
    "timestamp": 1386191112,
    "startOffset":4777,
    "service":"FTU",
    "flow":"DeviceAttached",
    "device":"FxOS 18",
    "locale":en
  }

The text was updated successfully, but these errors were encountered:

pdehaan · 2013-12-10T22:23:08Z

@dannycoates did you want to take this (and give it a milestone)

ckarlof · 2014-02-04T22:16:52Z

@kparlante is blocked on this.

dannycoates · 2014-02-05T23:53:23Z

I'd like to solve this in a way that auth-server logic won't ever have to do anything special for whatever type of analysis we want to do in the future. We added log.security for security events which I've already begun to hate because keeping these lines of code correct is tedious and error-prone since they don't affect the actual flow of control. They're basically like comments. I'd rather not keep adding special logging for each thing that's interested in what the auth-server is doing.

I'd like for the auth-server to log one line at the end of each request with enough information that any "following" process can get the data it needs to do it's thing. Metrics is special because it needs additional data that the auth server itself doesn't care about (or even have). I'm cool with adding an optional field to the post json as an opaque pass-through field that will be logged in the request summary.

I'm pretty sure that each case a follower would be interested in is covered by the response we send to the client and hence the proposed summary log line. If not, we're missing error codes.

I'd like to try something like this first because if it works it should be easier for everyone. If it doesn't then I'll suck it up and add special logging :)

dannycoates · 2014-02-05T23:58:19Z

#541

ckarlof · 2014-02-06T00:05:32Z

Go for it!

kparlante · 2014-02-10T23:51:52Z

@dannycoates : fwiw, makes sense to me -- error prone metrics gathering that is out of sync with the actual flow of control has bedeviled us in the past. As long as the logging happens in the error cases (as you mentioned), this approach sounds more reliable.

kparlante · 2014-02-11T00:27:23Z

@dannycoates : A few notes on the fields:

"eventName": we can align the "eventName" exactly to the endpoint. Presumably whether or not the endpoint "succeeded" is also going to be logged. (Vs separate "device_attach_started" & "device_attach_succeeded" events as described above.)
"device" can just be a pass through of the user agent.
"service" can be passed on for endpoints where we have it
"locale" can be a pass through of the locale.
"flow" -- if the auth-server can infer this, or pass along information so that a heka filter can infer this, it would be great. (If not, we can pass it through from the client eventually, as you described above)
We can ignore "startOffset" for now; again, pass through information

The fields are listed above in order of importance re: the user stories that are prioritized most highly

dannycoates · 2014-02-11T00:43:36Z

To clarify one point that isn't obvious from my description, the summary line will contain richer data than is sent to the client. For example if we track failed auth attempt count per email it will appear in the summary but not in the error sent to the client.

The auth-server will probably log details that we don't want to keep in a log file directly (or at least archive) but are useful to perform aggregation on. I imagine heka could aggregate an anonymize the data for suitable long-term storage. My point being we probably shouldn't keep raw auth-server log files long-term.

kparlante · 2014-02-11T19:15:38Z

Yes, I'm assuming that heka will aggregate, anonymize, and otherwise massage data into usefulness before it lands in elasticsearch. (And I'm working with trink to make that happen for the PM/UX related metrics data -- devops is working on getting us a dev environment with production data). That is where we can massage the user agent into device categories, for example. Agreed that we shouldn't be querying raw auth-server log files, or keeping them around long-term.

seanmonstar · 2014-02-11T21:31:34Z

Just to toss in another option: I solved this in Persona using intel, such that we just logged whatever we wanted, and then configured handlers to listen for certain messages, formatted them, and then sent them to the proper destination (log file, kpiggybank, statsd, etc).

dannycoates · 2014-02-14T22:46:20Z

Closed by #565

chore(awsbox): remove unused awsbox

ghost assigned dannycoates Dec 10, 2013

ckarlof modified the milestones: Feb 14, Q1-2014 (Mar 31) Feb 4, 2014

ckarlof added the ❤❤❤ label Feb 4, 2014

pdehaan mentioned this issue Feb 4, 2014

write posted metrics to the log #376

Closed

dannycoates closed this as completed Feb 14, 2014

rfk pushed a commit that referenced this issue Oct 24, 2018

Merge pull request #372 from jrgm/remove-awsbox

c94684b

chore(awsbox): remove unused awsbox

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First pass metrics logging #372

First pass metrics logging #372

kparlante commented Dec 4, 2013

pdehaan commented Dec 10, 2013

ckarlof commented Feb 4, 2014

dannycoates commented Feb 5, 2014

dannycoates commented Feb 5, 2014

ckarlof commented Feb 6, 2014

kparlante commented Feb 10, 2014

kparlante commented Feb 11, 2014

dannycoates commented Feb 11, 2014

kparlante commented Feb 11, 2014

seanmonstar commented Feb 11, 2014

dannycoates commented Feb 14, 2014

First pass metrics logging #372

First pass metrics logging #372

Comments

kparlante commented Dec 4, 2013

pdehaan commented Dec 10, 2013

ckarlof commented Feb 4, 2014

dannycoates commented Feb 5, 2014

dannycoates commented Feb 5, 2014

ckarlof commented Feb 6, 2014

kparlante commented Feb 10, 2014

kparlante commented Feb 11, 2014

dannycoates commented Feb 11, 2014

kparlante commented Feb 11, 2014

seanmonstar commented Feb 11, 2014

dannycoates commented Feb 14, 2014