A unified interface for different collection types #2671

AdamZh0u · 2025-02-09T03:47:32Z

What's the problem this feature will solve?
Currently, Mesa has multiple collection implementations (AgentSet, CellCollection) with overlapping functionality. This leads to:

Code duplication across different collection types
Inconsistent interfaces between collections
Difficulty in maintaining and extending collection functionality
Challenges when developing new collection types (e.g., for mesa-geo's LayerCollection @wang-boyu )

Describe the solution you'd like
Introduce a new CollectionBase class that would:

Provide a common foundation for all Mesa collections with shared functionality:
- Basic attributes (__contains__, __len__)
- Set and get operations
- Sequence operations (indexing, iteration)
- Filtering and selection (select shuffle, sort)
- Method invocation on items (map)
- Aggregation operations
Use weak references for proper memory management and garbage collection, crucial for simulation performance

Example usage:

class CollectionBase(Generic[T]):
    """A base collection class that provides set-like and sequence functionality. """
    def __init__(self, items: Iterable[T], random: Random | None = None):
        self._initialize_storage(items)

    def _initialize_storage(self, items: Iterable[T]) -> None:
        self._items = weakref.WeakKeyDictionary({item: None for item in items})

class AgentSet(CollectionBase[Agent]):
    """Agent-specific collection implementation"""
    pass

class CellCollection(CollectionBase[Cell]):
    """Cell-specific collection implementation"""
    pass

class LayerCollection(CollectionBase[Layer]):
    """Geographic layer collection for mesa-geo"""
    pass

Benefits:

Reduced code duplication and consistent interface across all collections
Easier maintenance and simplified development of new collection types
Collection based operations like aggregation and mapping
Improved performance through shared optimisations

Additional context

Reference implementation: Google Earth Engine API's Collection Implementation

This would be a significant architectural improvement for Mesa, making it more maintainable and extensible while providing a better developer experience.

The text was updated successfully, but these errors were encountered:

SongshGeo · 2025-02-09T10:19:35Z

I agree with that. We will use a similar collection implementation to manage layers, regular agents and geo-agents (e.g., with vector geometry). When refactoring Mesa-Geo, we found this will be very useful. ;)
I am excited to hear any updates.

quaquel · 2025-02-09T13:04:43Z

Do we need a full class hierarchy for this, or can we use duck-typing instead? That is, it might be easier to agree on a shared interface across the collection types and standardize this as a protocol

wang-boyu · 2025-02-09T16:28:10Z

protocol vs. generic probably depends on the amount of re-usable code between sub-types, such as between AgentSet and CellCollection. For now their select() methods look pretty much identical, apart from the typing from Agent to Cell:

mesa/mesa/agent.py

Lines 200 to 246 in ba5104f

    
               def select( 
        
                   self, 
        
                   filter_func: Callable[[Agent], bool] | None = None, 
        
                   at_most: int | float = float("inf"), 
        
                   inplace: bool = False, 
        
                   agent_type: type[Agent] | None = None, 
        
               ) -> AgentSet: 
        
                   """Select a subset of agents from the AgentSet based on a filter function and/or quantity limit. 
        
                   Args: 
        
                       filter_func (Callable[[Agent], bool], optional): A function that takes an Agent and returns True if the 
        
                           agent should be included in the result. Defaults to None, meaning no filtering is applied. 
        
                       at_most (int | float, optional): The maximum amount of agents to select. Defaults to infinity. 
        
                         - If an integer, at most the first number of matching agents are selected. 
        
                         - If a float between 0 and 1, at most that fraction of original the agents are selected. 
        
                       inplace (bool, optional): If True, modifies the current AgentSet; otherwise, returns a new AgentSet. Defaults to False. 
        
                       agent_type (type[Agent], optional): The class type of the agents to select. Defaults to None, meaning no type filtering is applied. 
        
                   Returns: 
        
                       AgentSet: A new AgentSet containing the selected agents, unless inplace is True, in which case the current AgentSet is updated. 
        
                   Notes: 
        
                       - at_most just return the first n or fraction of agents. To take a random sample, shuffle() beforehand. 
        
                       - at_most is an upper limit. When specifying other criteria, the number of agents returned can be smaller. 
        
                   """ 
        
                   inf = float("inf") 
        
                   if filter_func is None and agent_type is None and at_most == inf: 
        
                       return self if inplace else copy.copy(self) 
        
                   # Check if at_most is of type float 
        
                   if at_most <= 1.0 and isinstance(at_most, float): 
        
                       at_most = int(len(self) * at_most)  # Note that it rounds down (floor) 
        
                   def agent_generator(filter_func, agent_type, at_most): 
        
                       count = 0 
        
                       for agent in self: 
        
                           if count >= at_most: 
        
                               break 
        
                           if (not filter_func or filter_func(agent)) and ( 
        
                               not agent_type or isinstance(agent, agent_type) 
        
                           ): 
        
                               yield agent 
        
                               count += 1 
        
                   agents = agent_generator(filter_func, agent_type, at_most) 
        
                   return AgentSet(agents, self.random) if not inplace else self._update(agents)

and

mesa/mesa/experimental/cell_space/cell_collection.py

Lines 113 to 145 in ba5104f

    
               def select( 
        
                   self, 
        
                   filter_func: Callable[[T], bool] | None = None, 
        
                   at_most: int | float = float("inf"), 
        
               ): 
        
                   """Select cells based on filter function. 
        
                   Args: 
        
                       filter_func: filter function 
        
                       at_most: The maximum amount of cells to select. Defaults to infinity. 
        
                         - If an integer, at most the first number of matching cells is selected. 
        
                         - If a float between 0 and 1, at most that fraction of original number of cells 
        
                   Returns: 
        
                       CellCollection 
        
                   """ 
        
                   if filter_func is None and at_most == float("inf"): 
        
                       return self 
        
                   if at_most <= 1.0 and isinstance(at_most, float): 
        
                       at_most = int(len(self) * at_most)  # Note that it rounds down (floor) 
        
                   def cell_generator(filter_func, at_most): 
        
                       count = 0 
        
                       for cell in self: 
        
                           if count >= at_most: 
        
                               break 
        
                           if not filter_func or filter_func(cell): 
        
                               yield cell 
        
                               count += 1 
        
                   return CellCollection(cell_generator(filter_func, at_most), random=self.random)

Using inheritance might help reduce these duplicated code.

On a side note, CellCollection itself is a generic class. I guess there are different kinds of cells (unlike agents where user-defined agents subclass our mesa.Agent, so they're still of type mesa.Agent)?

mesa/mesa/experimental/cell_space/cell_collection.py

Line 32 in ba5104f

class CellCollection(Generic[T]):

If this is the case, then we need to change the definition of CellCollection to something like

class CellCollection(Generic[T], CollectionBase[T]):
    """Cell-specific collection implementation"""
    pass

quaquel · 2025-02-09T16:40:14Z

The weakref stuff is probably agentset specific, and will make it hard to generalize e.g., select. Likewise, the inplace bool is primarily there for performance reasons related to weakrefs.
The typing on CellCollection was done by @Corvince, so I am not fully shure about the reasoning behind it. I am not intimately familiar with the details of python typing yet. In my understanding, Generic[T] just means that CellCollection is a collection containing only instances of type T. I am not even sure we properly use the power of Generic elsewhere in the CellSpace code. At the moment, we only have a single Cell type, but the code is written such that users can implement and use their own Cell class as well.

wang-boyu · 2025-02-09T18:41:27Z

Thanks for the response!

The weakref stuff is probably agentset specific, and will make it hard to generalize e.g., select. Likewise, the inplace bool is primarily there for performance reasons related to weakrefs.

This isn't really an issue for me - cells can be owned by a discrete space that has all cells:

mesa/mesa/experimental/cell_space/discrete_space.py

Line 63 in 8c24e1b

self._cells: dict[tuple[int, ...], T] = {}

much like how agents are owned by a model:

mesa/mesa/model.py

Line 108 in 8c24e1b

self._agents = {} # the hard references to all agents in the model

Similarly layers are owned by HasPropertyLayers:

mesa/mesa/experimental/cell_space/property_layer.py

Line 207 in 8c24e1b

self._mesa_property_layers = {}

Then we can have the performance improvement for all collections.

At the moment, we only have a single Cell type, but the code is written such that users can implement and use their own Cell class as well.

In this case probably CellCollection doesn't need to be a generic class. Instead, it could be a colleciton of the base Cell class, like how AgentSet is a collection of the base Agent class.

I'm not entirely sure about this though. Anyways even if we'd like to make the change, this could be in a separate issue/PR?

Any comments/ideas on this issue will be much appreciated! @EwoutH @Corvince

quaquel · 2025-02-09T19:55:00Z

I don't follow your response to point 1. The use of weakrefs is a deliberate implementation choice in AgentSet that has knock-on consequences for all other methods in AgentSet. There is no compelling reason to use weakrefs elsewhere (e.g., for CellCollection). So, the amount of shared code is quite small to the point of being nonexistent and hence my suggestion to use a protocol.

AdamZh0u · 2025-02-10T04:37:44Z

Hi @quaquel, @wang-boyu, and @SongshGeo! Thank you for your comments.

It seems we have reached a consensus on enhancing the collection, but there is a point of disagreement on whether to use duck typing as a protocol or the CollectionBase class. I am considering three possible approaches:

Duck typing.
Using the CollectionBase class but without weakref.
Using the CollectionBase class with weakref.

I have already implemented the approach 3, and I will also implement the other two approaches. I think we only need to run some benchmark experiments to compare the performance of these three approaches. I will proceed to complete these experiments, and we can decide on the next steps based on the results.

If you have any suggestions or guidance regarding the benchmark experiments, please let me know!

Corvince · 2025-02-10T10:12:08Z

On a side note, CellCollection itself is a generic class. I guess there are different kinds of cells (unlike agents where user-defined agents subclass our mesa.Agent, so they're still of type mesa.Agent)?

mesa/mesa/experimental/cell_space/cell_collection.py

Line 32 in ba5104f

class CellCollection(Generic[T]):
If this is the case, then we need to change the definition of CellCollection to something like

class CellCollection(Generic[T], CollectionBase[T]):
"""Cell-specific collection implementation"""
pass

@quaquel already got it right, but yes DiscreteSpace allows using MyCell instead of Cell and the generic aims to represent that in types (althought not 100% there). But for the CollectionBase it should not matter and complicate things, I think this should be reflected already in CollectionBase[Cell] vs CollectionBase[MyCell]

Corvince · 2025-02-10T10:19:12Z

Do we need a full class hierarchy for this, or can we use duck-typing instead? That is, it might be easier to agree on a shared interface across the collection types and standardize this as a protocol

I would lean towards protocol, although I agree it depends on how similar the code is. Select seems pretty similar, but there are already small differences (inplace), so I am unsure if we can really consolidated every use case in a single method. Might get ugly around the corners. Duck typing keeps things nicely separated, where differences are present (already visible with weakref/no-weakref). Code can also be duplicated by calling a shared function.

wang-boyu · 2025-02-10T19:50:03Z

Thanks @Corvince. It would probably be easier to try duck typing (e.g., through Protocol) first? Then we can start adding methods into CellCollection (it only has select() for now) and check whether it's feasible to be merged with AgentSet into a generic CollectionBase from there.

I'm thinking perhaps it's possible to have a generic CollectionBase even if we want both weak ref and regular dictionary in it, by having some kind of weak_ref: bool = True or False switch when initializing it. But it may be too early to tell and we can revisit later.

quaquel · 2025-02-10T20:02:56Z

I like this pragmatic way forward. Let's standardize and formalize the interface first via a protocol. That should be pretty straightforward because I have tried to keep CellCollection and AgentSet similar and have ported api ideas from one to the other.

wang-boyu added the enhancement Release notes label label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A unified interface for different collection types #2671

A unified interface for different collection types #2671

AdamZh0u commented Feb 9, 2025 •

edited

Loading

SongshGeo commented Feb 9, 2025

quaquel commented Feb 9, 2025

wang-boyu commented Feb 9, 2025 •

edited by quaquel

Loading

quaquel commented Feb 9, 2025

wang-boyu commented Feb 9, 2025

quaquel commented Feb 9, 2025

AdamZh0u commented Feb 10, 2025 •

edited

Loading

Corvince commented Feb 10, 2025

Corvince commented Feb 10, 2025

wang-boyu commented Feb 10, 2025

quaquel commented Feb 10, 2025

A unified interface for different collection types #2671

A unified interface for different collection types #2671

Comments

AdamZh0u commented Feb 9, 2025 • edited Loading

SongshGeo commented Feb 9, 2025

quaquel commented Feb 9, 2025

wang-boyu commented Feb 9, 2025 • edited by quaquel Loading

quaquel commented Feb 9, 2025

wang-boyu commented Feb 9, 2025

quaquel commented Feb 9, 2025

AdamZh0u commented Feb 10, 2025 • edited Loading

Corvince commented Feb 10, 2025

Corvince commented Feb 10, 2025

wang-boyu commented Feb 10, 2025

quaquel commented Feb 10, 2025

AdamZh0u commented Feb 9, 2025 •

edited

Loading

wang-boyu commented Feb 9, 2025 •

edited by quaquel

Loading

AdamZh0u commented Feb 10, 2025 •

edited

Loading