Missing group-id assignments in data export #1930

metasoarous · 2025-02-18T22:07:45Z

Expected behavior:
That the number of nonempty in the group-id column of the participant-votes.csv match the "grouped" counts from the automated report.

Actual behavior:
There's an order of magnitude difference between the counts.

To Reproduce:
Download export and count columns with pandas or other CSV processing utility.

Screenshots:
Counts from the automated report interface:

Counts from applying python csvkit's csvstat utility to export data (for clarity, note that it's interpreting 0/1 as boolean):

Counts from pandas value_count() method on export data:

Additional context:

Reviewing the export data in greater detail, it seems that all of the participants who were grouped fall within the pid range of 1-170, so front loaded in terms of time at which they joined the conversation.
It could be a red herring, but it was a bit suspicious that the number of grouped participants (as inferred by the export data) was exactly 100 🤔
This didn't change over the course of a couple of hours, during which there's been activate participation.

The text was updated successfully, but these errors were encountered:

metasoarous · 2025-02-19T22:25:56Z

Update: It looks like this data is getting updated, but is still capping out at 100 nonzero group-id entries.

The latest counts on this conversation:

This specific number (100) has me thinking that the issue may have to do with an error in how the export is using the math blob participant buckets to assign group-ids, since the number of buckets is set to 100 by default.

ballPointPenguin · 2025-02-20T18:42:21Z

@metasoarous FYI I'm trying to repro this locally and get a better look at root cause. Thanks for the insight!

ballPointPenguin · 2025-02-20T22:01:52Z

In my dev convo test, with 159 participants, I see that exactly the first 100 are assigned to groups, none after that.

ballPointPenguin · 2025-02-24T08:53:25Z

Interestingly, when I export the data the "old school" way, via clojure -M:run, the group-ids are all included as expected.
~~This makes me wonder if the problem is in server and not math.~~
update: I think the bug is in math, just not the part that is used for CLI export construction

metasoarous · 2025-02-24T17:03:42Z

Thanks for looking into this @ballPointPenguin!

I'm a bit skeptical that the it's in the math, unless it's coming from the new math implementation. That's because I can see from the network console that the math blob that hits the report appears to have all of the base clusters members coming through:

I think the problem is that the group-clusters members entries are pointing not directly to participants, but to the base-clusters. You need to get the base clusters via those ids, then from there to the underlying participant ids:

I can explain more about why it's set up this way, but hopefully this should make for an easy fix.

Thanks again!

colinmegill · 2025-02-24T20:17:20Z

I also don't think it's in the math, and do believe it's in the new export endpoints. Agree!

colinmegill · 2025-02-24T20:19:17Z

Here:

https://github.com/compdemocracy/polis/blob/edge/server/src/routes/export.ts#L237

ballPointPenguin self-assigned this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing group-id assignments in data export #1930

Missing group-id assignments in data export #1930

metasoarous commented Feb 18, 2025

metasoarous commented Feb 19, 2025 •

edited

Loading

ballPointPenguin commented Feb 20, 2025

ballPointPenguin commented Feb 20, 2025

ballPointPenguin commented Feb 24, 2025 •

edited

Loading

metasoarous commented Feb 24, 2025

colinmegill commented Feb 24, 2025

colinmegill commented Feb 24, 2025

Missing group-id assignments in data export #1930

Missing group-id assignments in data export #1930

Comments

metasoarous commented Feb 18, 2025

metasoarous commented Feb 19, 2025 • edited Loading

ballPointPenguin commented Feb 20, 2025

ballPointPenguin commented Feb 20, 2025

ballPointPenguin commented Feb 24, 2025 • edited Loading

metasoarous commented Feb 24, 2025

colinmegill commented Feb 24, 2025

colinmegill commented Feb 24, 2025

metasoarous commented Feb 19, 2025 •

edited

Loading

ballPointPenguin commented Feb 24, 2025 •

edited

Loading