-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing group-id assignments in data export #1930
Comments
Update: It looks like this data is getting updated, but is still capping out at 100 nonzero group-id entries. The latest counts on this conversation: This specific number (100) has me thinking that the issue may have to do with an error in how the export is using the math blob participant buckets to assign group-ids, since the number of buckets is set to 100 by default. |
@metasoarous FYI I'm trying to repro this locally and get a better look at root cause. Thanks for the insight! |
In my dev convo test, with 159 participants, I see that exactly the first 100 are assigned to groups, none after that. |
Interestingly, when I export the data the "old school" way, via clojure -M:run, the group-ids are all included as expected. |
Thanks for looking into this @ballPointPenguin! I'm a bit skeptical that the it's in the math, unless it's coming from the new math implementation. That's because I can see from the network console that the math blob that hits the report appears to have all of the base clusters members coming through: I think the problem is that the group-clusters members entries are pointing not directly to participants, but to the base-clusters. You need to get the base clusters via those ids, then from there to the underlying participant ids: I can explain more about why it's set up this way, but hopefully this should make for an easy fix. Thanks again! |
I also don't think it's in the math, and do believe it's in the new export endpoints. Agree! |
Expected behavior:
That the number of nonempty in the
group-id
column of theparticipant-votes.csv
match the "grouped" counts from the automated report.Actual behavior:
There's an order of magnitude difference between the counts.
To Reproduce:
Download export and count columns with pandas or other CSV processing utility.
Screenshots:

Counts from the automated report interface:
Counts from applying python

csvkit
'scsvstat
utility to export data (for clarity, note that it's interpreting 0/1 as boolean):Counts from

pandas
value_count()
method on export data:Additional context:
The text was updated successfully, but these errors were encountered: