Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical Causal Models #236

Open
wants to merge 97 commits into
base: main
Choose a base branch
from
Open

Hierarchical Causal Models #236

wants to merge 97 commits into from

Conversation

adamrupe
Copy link
Collaborator

@adamrupe adamrupe commented Sep 6, 2024

Closes #278

This PR implements Hierarchical Causal Models (Weinstein and Blei, 2024)

This PR will be ready for review when the following algorithms have been tested and implemented.

  • Algorithm 1: Graphical algorithm for collapsing a hierarchical causal graphical model (HCGM). This algorithm transforms the graph of a hierarchical causal model (HCM) into the graph of its collapsed model, following Definition 4.
  • Algorithm 2: Graphical algorithm for augmenting a collapsed model. This algorithm adds an
    augmentation variable to a collapsed HCGM, following Definition 6.
  • Algorithm 3: Graphical algorithm for marginalizing an augmented model. This algorithm
    marginalizes out parent(s) of an augmentation variable (Section 5.2).
  • Causal query pipeline: Utilizes Algorithms 1 -3 (as needed) to check if a causal query is identifiable in the HCM. The use of Algorithms 2 and 3 depends on the causal query, i.e. whether a variable needs to be augmented in (Alg 2) and then whether another variable needs to be marginalized out (Alg 3).
  • HSCM tests
  • High-level example (with real-world motivation) that shows how to do a causal query on a HCM

@adamrupe adamrupe linked an issue Sep 6, 2024 that may be closed by this pull request
Copy link

codecov bot commented Sep 6, 2024

Codecov Report

Attention: Patch coverage is 88.12500% with 19 lines in your changes missing coverage. Please review.

Project coverage is 81.27%. Comparing base (05a9456) to head (3af8c66).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/y0/hierarchical.py 88.12% 9 Missing and 10 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   80.87%   81.27%   +0.39%     
==========================================
  Files          50       51       +1     
  Lines        4135     4314     +179     
  Branches      845      981     +136     
==========================================
+ Hits         3344     3506     +162     
- Misses        668      670       +2     
- Partials      123      138      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cthoyt
Copy link
Member

cthoyt commented Sep 10, 2024

hi @adamrupe - can you add a checklist into the PR description with the tasks to complete for this PR before it needs review?

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • tox.ini: Language not supported

@djinnome djinnome marked this pull request as ready for review January 24, 2025 00:01
cthoyt

This comment was marked as outdated.

@cthoyt cthoyt force-pushed the HCM-fig2 branch 2 times, most recently from 3237de1 to ef54579 Compare February 3, 2025 08:42
Copy link
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a major refactor to address the software issues from the last round. The next steps for @adamrupe and @djinnome are:

  1. Read through the code and familiarize yourselves with the new interface
  2. Comment on / address all TODO's I left in the code (there aren't many)
  3. Tests
    • Either implement tests for conversion to HSCM or delete the conversion code
    • Test augment_collapsed_model
  4. Check the notebook, which used to raise some exceptions, but I replaced those with the high-level identify_outcomes API. Please review to make sure that the places where there is no estimand produced because the graph has a single c-component are all correct
  5. Create a high-level, real world example that demonstrates using all of the code in a story-driven workflow (i.e., do not explain the math, only explain which functions you implemented solve the problem). Use https://github.com/y0-causal-inference/y0/blob/main/notebooks/Counterfactual%20Transportability.ipynb as a golden standard for how a great notebook with applications looks

Along the way, please make sure that you check the CI/CD system for automated, objective feedback on code quality. @adamrupe if you're not familiar with how to do this, I am happy to show you

@adamrupe
Copy link
Collaborator Author

adamrupe commented Feb 5, 2025

@cthoyt What's your recommendation for handling merge conflicts with jupyter notebooks? I need to do this before I can pull your updates. I'm also not familiar with the CI/CD system, so if you could talk me through it that would be great.

@cthoyt
Copy link
Member

cthoyt commented Feb 5, 2025

@adamrupe before merging, copy your local notebook to your desktop. While merging, throw away everything from your repository's copy and overwrite it with remote. Then, you can think about manually inspecting your notebook on your desktop, and the new version from the remote repo side-by-side.

The best way to avoid this kind of thing is never to leave changes unpushed when you finish working, and to always pull before you start working again


The short explanation of how to use the CI/CD system is: you can always scroll to the bottom of this pull request (#236) and look at the feedback given by GitHub running our unit tests, linting, and code quality checks.

This is what it looks like to me right now:

Screenshot 2025-02-05 at 23 51 20

You can click on any of the rows with the red x's, and then it will bring you to the page that ran the tests for you. Right now, you will be able to see all of the output from running pytest. You have to scroll up a bit since unfortunately, pytest reports timings and warnings after test failures, but you can see https://github.com/y0-causal-inference/y0/actions/runs/13134504627/job/36646756591?pr=236#step:6:69 for the currently failing test.

Similarly, while you're still getting used to having code quality checks, you will probably see that the linting or type checking scripts also give errors, which you can view in the same way..

It's sort of the expectation in a team setting for coding that you make pushes often, and each time check out what kind of feedback CI gives. This will help you iteratively make your code better, with fully objective feedback that you don't have to wait on someone else to give you. Alternatively to CI/CD in GitHub, you can run tox which also creates a reproducible execution of all of the testing suite.

There's documentation in the README on how to use all of the nice development tools built into this repo at https://github.com/y0-causal-inference/y0?tab=readme-ov-file#%EF%B8%8F-for-developers

If you get caught up on any parts of this that aren't self-explanatory, I'm happy to plan a video chat tomorrow, or sometime next before 6PM germany time

@adamrupe
Copy link
Collaborator Author

adamrupe commented Feb 6, 2025

Awesome, thanks @cthoyt! That makes sense, and Jeremy and Richard have already shown me how to use tox a bit. I've pulled your changes and I'm going through them now. I'll add a test_to_hscm.

@adamrupe
Copy link
Collaborator Author

@cthoyt I've refactored HSCMs and filled in the HCM.to_hscm() tests, so all tests are now passing. However, there is a depreciation warning I'm getting from another part of the codebase:

src/y0/examples/__init__.py:1173: FutureWarning: 
Downcasting behavior in `replace` is deprecated and will be removed in a future version. 
To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. 
To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
    asia_df = pd.read_csv(ASIA_PATH).replace({"yes": 1, "no": -1})

There is also a mypy error from graph.py

mypy: commands[0]> mypy --ignore-missing-imports --strict src/ tests/test_hierarchical.py
src/y0/graph.py:480: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]

Since I didn't write this code, I didn't want to make edits to fix these, but they seem straightforward fixes.

I'll add a test for augment_collapsed_model, and Jeremy and I have discussed a high-level notebook that I'll create as well. This should then complete your requested changes. I'll ping again when they are all complete.

@cthoyt
Copy link
Member

cthoyt commented Mar 1, 2025

@adamrupe you should be unblocked on the CI/CD pipeline now. looking forward to seeing a nice case study notebook, then we can finish this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hierarchical Causal Models Implement hierarchical causal models from figure 2 in pygraphviz
2 participants