Skip to content

2017 09 12 Meeting Minutes

Sebastian Benthall edited this page Sep 12, 2017 · 5 revisions

BigBang Meeting 2017-09-12

Attendees:

  • Sebastian Benthall
  • Nick Doty
  • Harsh Gupta

Updates

Harsh has graduated and is working in Data Science.

Seb and Nick are still finishing their disserations.

Indian Supreme Court has said privacy is a fundamental right. CIS India was involved in that parliamentary committee.

Studying demographics

Nick has been trying to understand demographics across a large number of mailing lists. Maybe can learn some things, starting with gender and then generalizing from there.

HG: If you do some research using automated tools, shouldn't we worry about how accurate it is, so people have a bound about how well it performs.

ND: Yes, working on getting a start for now. There's a large need for manual or crowdsourced identification, but automation may help speed it up.

HG: Can you explain in one line or two how it work?

ND: I'm using a library that uses birth records in different countries to draw a connection between names and gender. It has drawbacks that it is rather specific to some countries. These groups are more global. Also, it has a very high confidence threshold, so I am developing a workflow for importing in new names.

HG: We could also use WikiData for this. You can query WikiData for particular persons, attributes associated with them. Maybe that could improve accuracy.

ND: There's some work on GitHub commits, Google+, which has gender marked. Not sure whether it will make a big difference.

How to deal with ongoing research in the repository

How do we reuse useful code?

Let's put more general features in the library and then put an illustration of it in examples.

HG: That puts more work on the developer side, because the developer doesn't always have the incentive to generalize the function. Rather than having most logic in the example itself, should break away the functionality into the library code.

ND: Jupyter notebook is good for documentation of research, the logic of it. Everything will have different needs. A lot of the benefit of BigBang is showing people are investigating this question.

ND: How do we manage growth of our personal scientific code and general use software. If we agree that there's a differentiation between the two, and agree to communicate early about the general code, that would be good.

ND: We discussed having a sort of chat thing. Should we have one?

HG: I recommend Gitter.

SB: I will set it up.

What to do before next release

ND: What about documentation?

SB: Great. Let's say that's something we need for the next release and discuss it in our next meeting, as it's a larger discussion.

https://github.com/datactive/bigbang/issues/286