Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Externally owned data in hypre_CSRMatrix #1095

Open
ictmay opened this issue May 22, 2024 · 11 comments
Open

Question: Externally owned data in hypre_CSRMatrix #1095

ictmay opened this issue May 22, 2024 · 11 comments

Comments

@ictmay
Copy link

ictmay commented May 22, 2024

I am working with the LANL AMP team. We have a parallel CSR matrix format that should be compatible with the Hypre format. Ideally we would wrap our format into a Hypre IJMatrix with a minimal amount of duplicate data.

Question #609 provides some initial information on how to approach this. I have looked at which (de)allocations are guarded by owns_data inside hypre_ParCSRMatrixCreate, hypre_CSRMatrixCreate, and the matching destroy functions to decide what we should own versus what Hypre should own. The heavy data all lives in the diag and offd blocks.

At present I do the following:

  1. Create IJMatrix and set type to HYPRE_PARCSR, do not call initialize.
  2. Call hypre_IJMatrixCreateParCSR. This sets owns_data in ParCSR to 1 and creates diag and offd structs.
  3. Set need_aux to zero inside the IJMatrix->translator.
  4. Grab the diag and offd blocks and call hypre_CSRMatrixInitialize manually. This allocates the ->i field which is not guarded by owns_data.
  5. Set ownership to zero on diag and offd via hypre_CSRMatrixSetDataOwner.
  6. Fill the ->i fields.
  7. Set the ->big_j, ->j, and ->data fields of diag and offd to reference our data. These are all guarded by owns_data and are the heaviest memory requirements.

The issue comes from the ParCSR->col_map_offd field. I'd prefer to leave ParCSR->owns_data==1 so that it will manage the setup/teardown of the CSRMatrix structs themselves and then I can just set owns_data to zero in there. This means that Hypre needs to also own the col_map_offd field. The main benefit is that HYPRE_IJMatrixDestroy does (seem to) do all of the correct deallocations.

That is a long-winded set up to premise a couple of questions:

  • Is this approach valid, or should I set ParCSR->owns_data=0 and manage more of the setup/teardown on my end?
  • I can use HYPRE_IJMatrixAssemble to get col_map_offd created and filled. The internal function hypre_IJMatrixAssembleParCSR does not check the value of offd->owns_data and frees offd->big_j yielding an eventual double-free (line 2922 of IJMatrix_parcsr.c).
  • I'm a little confused by the code that fills col_map_offd and rearranges offd->j. What are the exact requirements on col_map_offd? It must be more than just a map from local->global indices since offd->j also gets shuffled.

I did add a check for the offd->owns_data flag into hypre_IJMatrixAssembleParCSR, and that does avoid the double-free issue. Surprisingly this allows BoomerAMG to work as a solver, but I think that is a fluke. This doesn't really solve the overall issue. That function still modifies values in the big_j and j arrays that it doesn't own. As a minimal solution it should check the owns_data flag and throw an error.

That's quite the wall of text. I appreciate any help, and please let me know if any clarifications are needed.

@ictmay ictmay closed this as completed Sep 19, 2024
@ictmay ictmay reopened this Nov 20, 2024
@ictmay
Copy link
Author

ictmay commented Nov 20, 2024

I am re-opening this to update it to what seems to be working. I have only medium confidence here, so grain of salt. Any input on whether this is valid would still be appreciated.

The setup is as follows. This is not a direct copy of the code but should get the point across.

  1. Call HYPRE_IJMatrixCreate(..., &d_matrix) and HYPRE_IJMatrixSetObjectType to make a new parcsr matrix
  2. Call hypre_IJMatrixCreateParCSR( d_matrix )
  3. Extract the parcsr matrix via hypre_ParCSRMatrix *par_matrix = (hypre_ParCSRMatrix *) d_matrix->object
  4. Extract on/off diagonal blocks hypre_CSRMatrix *diag = par_matrix->diag and hypre_CSRMatrix *offd = par_matrix->offd
  5. Extract auxiliary matrix d_matrix->translator and set need_aux to zero inside it
  6. Call hypre_CSRMatrixInitialize on both diag and offd
  7. Fill diag->i according to my externally owned matrix
  8. Fill offd->i if I have one, otherwise just set all zeros
  9. Set ownership to zero with hypre_CSRMatrixSetDataOwner( diag, 0 ) and hypre_CSRMatrixSetDataOwner( offd, 0 )
  10. Set diag->big_j, diag->j, and diag->data, to point at corresponding fields of external matrix. ->big_j presumably optional
  11. Do the same for offd
  12. Set diag->num_nonzeros and offd->num_nonzeros
  13. Set par_matrix->col_map_offd to map converting from local to global indices in off diagonal block
  14. Set offd->num_cols to match length of col_map_offd
  15. Call hypre_CSRMatrixSetRownnz on diag and offd
  16. Finally, set d_matrix->assemble_flag = 1

A few of these steps are non-trivial and potentially wrong.

Step 6 allocates the ->i field in diag and offd. These fields are not protected by ->owns_data so Hypre needs to allocate them (they are small so it doesn't really matter). Step 15 does the same thing for the same reason.

Step 13 and the layouts of ->j in steps 10 and 11 are also touchy. In diag the j field should be sorted to have the diagonal element occur first in each row, and the remaining indices should be sorted in ascending order. The j field in offd should have ascending indices in each row. Finally, col_map_offd should be in ascending order (changing the local indices to match).

@victorapm
Copy link
Contributor

victorapm commented Nov 21, 2024

@ictmay thank you for the update and apologize for the delay.

On the surface, the logic seems correct including your comments about column ordering (except for remaining column indices being sorted in ascending order: that is not strictly necessary but advisable). Is this implementation openly available somewhere in AMP so we can take a look?

We can also discuss about adding owns field to some of the pointers in our matrix structure if you find it useful. Other suggestions to make the implementation of your interface easier are also welcomed!

Thank you!

@ictmay
Copy link
Author

ictmay commented Nov 21, 2024

@victorapm No worries on the delay, and thank you for getting back to us.

For access to the source I'll have to talk to Bobby. I only work on the internal (restricted enclave) version, and I'm not sure what all has made it into the externally viewable version.

Regarding the owns flag(s), the main item is the col_map_offd member of hypre_ParCSRMatrix. I neglected to mention it above, but that one is also not guarded. Right now I am careful to null that field out before calling HYPRE_IJMatrixDestroy.

The column map isn't so large (except when it is...), so it could easily fall into the same category as the ->i and ->rownnz fields inside the on/off diagonal blocks. It also seems that Hypre will break in mysterious ways if this map is not ordered in exactly the right way. Having Hypre own and create this field would be fine and I could just make my own copy/reference of it after the fact for our communicators.

A nice solution that would avoid most of the touchy steps above, though one that would take some non-zero dev time and testing time, would be to make HYPRE_IJMatrixAssemble (or some similar function) callable in the non-owned data case. This would let Hypre set up all of the meta-data as it sees fit, and could remove many of the steps above.

I don't know how often people want to do shallow-copies into Hypre, and without broader demand it makes more sense to have this as an interface we maintain rather than something that gets added to Hypre.

@ictmay
Copy link
Author

ictmay commented Nov 21, 2024

@victorapm All of the shallow copy code is present in our external code here:

https://github.com/AdvancedMultiPhysics/AMP/blob/master/src/matrices/data/hypre/HypreMatrixAdaptor.cpp

Lines 41 -- 43 create the IJMatrix, and the process described above starts on line 136 and continues to the end of the file.

@victorapm
Copy link
Contributor

Hi @ictmay , I'm sorry I dropped the ball on this issue!

I looked at the link you shared in the previous post and saw there has been some recent changes in AMP. Is this still an issue to you? Is there something we can help with, e.g., providing an ownership flag for col_map_offd? Please, let me know!

@ictmay
Copy link
Author

ictmay commented Feb 24, 2025

@victorapm No worries.

Short answer, yes a flag guarding col_map_offd would be helpful.

There have been some changes from our side, but the majority of the shallow copy approach is the same as outlined above. The main points are swapping non-zeros per row for the starting positions of each row, and deleting the global column indices when no longer needed. We have since tested on more machines and I have some more confidence that the above is valid, at least for now.

Adding a guard around col_map_offd that references the owns_data field would be helpful. If there is some situation you can imagine where owns_data would be zero but col_map_offd is owned by Hypre then a new flag could be introduced. Though, at that point we might as well just null field out before calling destroy like we are doing now.

The biggest hurdle was getting column ordering within each row correct (diag block with diagonal element first and ascending after that, offd ascending order on local column specifically). Having that documented somewhere would be helpful. Even better would be a higher level function to call that would do the sort. That way if the expected order changes we wouldn't have to rewrite this.

The ideal would be some function to create a ParCSR matrix from user data abstracting all of this away. It would take in diag/offd row pointers, local column indices, data fields, and offd column map from the user and a flag to allow/disallow mutation of them. When disallowed Hypre would check that everything is compatible with its assumptions (e.g. column ordering) and error if not.

That last point is not needed by us presently, but if there is broader interest in supporting these shallow copies that would go a long way.

@victorapm
Copy link
Contributor

Adding a guard around col_map_offd that references the owns_data field would be helpful

@ictmay I got confused earlier, now I see the code is already checking owns_data to decide whether to free col_map_offd or not. Would this be enough for you or would you need a dedicate owns_col_map_offd flag to decide whether to free or not col_map_offd?

The ideal would be some function to create a ParCSR matrix from user data abstracting all of this away. It would take in diag/offd row pointers, local column indices, data fields, and offd column map from the user and a flag to allow/disallow mutation of them. When disallowed Hypre would check that everything is compatible with its assumptions (e.g. column ordering) and error if not.

Sounds good, I will check with the team about this, thanks for the suggestion

Lastly, if I may suggest a performance improvement to the HypreMatrixAdaptor code:

https://github.com/AdvancedMultiPhysics/AMP/blob/626480de8304914e4f701eb135b0fa7156d160c9/src/matrices/data/hypre/HypreMatrixAdaptor.cpp#L82-L101

When running on GPUs, we want to minimize the number of calls to HYPRE_IJMatrixSetValues and instead set a block of rows at once (currently, it's setting one row per call)

Hope this helps

@ictmay
Copy link
Author

ictmay commented Feb 24, 2025

@victorapm I guess I was also confused. It looks like col_map_offd is in fact guarded by owns_data. I'm not sure why I thought it wasn't. We don't need a special flag for the column map, we own it along with the other fields. I'll get rid of that extra step of nulling that field before calling destroy. Assuming that that doesn't break anything then we can close this issue.

If you all do decide to make a function to create a matrix from user data I would be happy to put it to use.

Regarding the performance improvement, I will fix this if I get time, but that path is not performance critical right now anyway. That should only get called if we can't do a shallow copy, which in turn only happens if the incoming matrix isn't in our own format (e.g. a user has a Trilinos matrix and wants BoomerAMG as a preconditioner).

Thank you again for all the help!

@ictmay
Copy link
Author

ictmay commented Feb 24, 2025

@victorapm Apologies, the intricacies of this are coming back to me in waves. The col_map_offd thing was a bit of a red herring.

The real problem is ->i and ->rownnz not being guarded by owns_data in hypre_CSRMatrixDestroy. Because those get free'd no matter what, I was using hypre_CSRMatrixInitialize in my code and not setting owns_data to 0 on the enclosing ParCSR.

Guarding those fields means that I can safely pass my equivalent field(s) in without them getting free'd by Hypre. That does open the question of whether the diag and offd structs themselves should be guarded by owns_data in hypre_ParCSRMatrixDestroy. Currently they are guarded but they probably shouldn't be. Those will have their own internal owns_data flags set to zero, but the structs themselves do still need to get free'd when the outer-most Hypre object gets destroyed.

Hopefully that makes sense.

@victorapm
Copy link
Contributor

@ictmay thank you for explaining everything.

We could add a owns_row_data member in the CSRMatrix data structure to decide on freeing ->i and ->rownnz. Does that sound good?

Regarding owns_data in ParCSRMatrix, which frees both diag and offd matrices, I see your point about whether this should be explicitly guarded. However, changing this behavior would introduce memory leaks in several codes, so I’d be cautious about making such a modification. Perhaps we could explore alternatives to address any concerns you have.

@ictmay
Copy link
Author

ictmay commented Mar 4, 2025

@victorapm Adding a guard around the ->i and ->rownnz fields to prevent them from getting freed when not owned would be helpful.

I take your point about changing the behavior of a free being touchy, though making diag and offd matrices un-guarded would lead to them being freed in more cases rather than fewer. I can handle either choice of guarded vs unguarded, so I mention it more for completeness than anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants