Scalability issue in decompInit_lnd #2995

briandobbins · 2025-03-09T22:15:30Z

In the decompInit_lnd routine (in src/main/decompInitMod.F90), there's a performance issue in the section below:

Lines 226 to 236 in 7ff6061

    
           do m = 1,nclumps 
        
              if ((clumps(m)%owner >  clumps(cid)%owner) .or. & 
        
                  (clumps(m)%owner == clumps(cid)%owner .and. m > cid)) then 
        
                 clumps(m)%begg = clumps(m)%begg + 1 
        
              endif 
        
              if ((clumps(m)%owner >  clumps(cid)%owner) .or. & 
        
                  (clumps(m)%owner == clumps(cid)%owner .and. m >= cid)) then 
        
                 clumps(m)%endg = clumps(m)%endg + 1 
        
              endif 
        
           enddo

This happens because the main loop, outside of that, is over every cell, and this internal loop is over the number of clumps, which is at least equal to the number of PEs. For km-scale runs, this ends up being quite large - eg, on the 3.75km test case, the outer loop is ~42M, and the inner loop is >~40K, since that's the minimum we're able to run the case on. The conditional does restrict the inner loop to only happening on ~12.4M of the 42M cells, but it's still a problem to have things scale by the number of cores.

In terms of data, the loop above took between 754 - 1044 seconds, averaging 804 across all PEs. That's ~86% of the InitializeRealize call.

I've got a few ideas on things to try, including simply changing the complex conditionals as a temporary work-around, as well as saving/reading a decomposition, but would welcome insights from folks who know the land model.

Perhaps the other issue here, which may be more important in the end, is that it seems like 'clumps' is allocated on every rank for every rank?

CTSM/src/main/decompInitMod.F90

Line 118 in 7ff6061

allocate(clumps(nclumps), stat=ier)

Again, I welcome input here - this is a challenge for memory scalability, at least, and unless this is needed elsewhere, we should move to a local-to-the-rank structure.

Anyway, just getting the issue in - I think I can create work-arounds for the near-term needs, but would be happy to chat with any land folks on this and see if we can get a SIF or some other way to focus on addressing this soon, too.

Thanks!

wwieder · 2025-03-11T20:11:44Z

Thanks for creating this issue, @briandobbins. Let us know the timeline that's helpful for this to be addressed. As you know, we're kind of slammed with prepping for CLM6 / CESM3, so addressing this after the release will be more realistic. That said, we don't want poor scalability hindering the work you're trying to do for high res work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability issue in decompInit_lnd #2995

Scalability issue in decompInit_lnd #2995

briandobbins commented Mar 9, 2025

wwieder commented Mar 11, 2025

Scalability issue in decompInit_lnd #2995

Scalability issue in decompInit_lnd #2995

Comments

briandobbins commented Mar 9, 2025

wwieder commented Mar 11, 2025