Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A unified interface for different collection types #2671

Open
AdamZh0u opened this issue Feb 9, 2025 · 11 comments
Open

A unified interface for different collection types #2671

AdamZh0u opened this issue Feb 9, 2025 · 11 comments
Labels
enhancement Release notes label

Comments

@AdamZh0u
Copy link
Contributor

AdamZh0u commented Feb 9, 2025

What's the problem this feature will solve?
Currently, Mesa has multiple collection implementations (AgentSet, CellCollection) with overlapping functionality. This leads to:

  1. Code duplication across different collection types
  2. Inconsistent interfaces between collections
  3. Difficulty in maintaining and extending collection functionality
  4. Challenges when developing new collection types (e.g., for mesa-geo's LayerCollection @wang-boyu )

Describe the solution you'd like
Introduce a new CollectionBase class that would:

  1. Provide a common foundation for all Mesa collections with shared functionality:

    • Basic attributes (__contains__, __len__)
    • Set and get operations
    • Sequence operations (indexing, iteration)
    • Filtering and selection (select shuffle, sort)
    • Method invocation on items (map)
    • Aggregation operations
  2. Use weak references for proper memory management and garbage collection, crucial for simulation performance

Example usage:

class CollectionBase(Generic[T]):
    """A base collection class that provides set-like and sequence functionality. """
    def __init__(self, items: Iterable[T], random: Random | None = None):
        self._initialize_storage(items)

    def _initialize_storage(self, items: Iterable[T]) -> None:
        self._items = weakref.WeakKeyDictionary({item: None for item in items})

class AgentSet(CollectionBase[Agent]):
    """Agent-specific collection implementation"""
    pass

class CellCollection(CollectionBase[Cell]):
    """Cell-specific collection implementation"""
    pass

class LayerCollection(CollectionBase[Layer]):
    """Geographic layer collection for mesa-geo"""
    pass

Benefits:

  1. Reduced code duplication and consistent interface across all collections
  2. Easier maintenance and simplified development of new collection types
  3. Collection based operations like aggregation and mapping
  4. Improved performance through shared optimisations

Additional context

This would be a significant architectural improvement for Mesa, making it more maintainable and extensible while providing a better developer experience.

@SongshGeo
Copy link

I agree with that. We will use a similar collection implementation to manage layers, regular agents and geo-agents (e.g., with vector geometry). When refactoring Mesa-Geo, we found this will be very useful. ;)
I am excited to hear any updates.

@quaquel
Copy link
Member

quaquel commented Feb 9, 2025

Do we need a full class hierarchy for this, or can we use duck-typing instead? That is, it might be easier to agree on a shared interface across the collection types and standardize this as a protocol

@wang-boyu
Copy link
Member

wang-boyu commented Feb 9, 2025

protocol vs. generic probably depends on the amount of re-usable code between sub-types, such as between AgentSet and CellCollection. For now their select() methods look pretty much identical, apart from the typing from Agent to Cell:

mesa/mesa/agent.py

Lines 200 to 246 in ba5104f

def select(
self,
filter_func: Callable[[Agent], bool] | None = None,
at_most: int | float = float("inf"),
inplace: bool = False,
agent_type: type[Agent] | None = None,
) -> AgentSet:
"""Select a subset of agents from the AgentSet based on a filter function and/or quantity limit.
Args:
filter_func (Callable[[Agent], bool], optional): A function that takes an Agent and returns True if the
agent should be included in the result. Defaults to None, meaning no filtering is applied.
at_most (int | float, optional): The maximum amount of agents to select. Defaults to infinity.
- If an integer, at most the first number of matching agents are selected.
- If a float between 0 and 1, at most that fraction of original the agents are selected.
inplace (bool, optional): If True, modifies the current AgentSet; otherwise, returns a new AgentSet. Defaults to False.
agent_type (type[Agent], optional): The class type of the agents to select. Defaults to None, meaning no type filtering is applied.
Returns:
AgentSet: A new AgentSet containing the selected agents, unless inplace is True, in which case the current AgentSet is updated.
Notes:
- at_most just return the first n or fraction of agents. To take a random sample, shuffle() beforehand.
- at_most is an upper limit. When specifying other criteria, the number of agents returned can be smaller.
"""
inf = float("inf")
if filter_func is None and agent_type is None and at_most == inf:
return self if inplace else copy.copy(self)
# Check if at_most is of type float
if at_most <= 1.0 and isinstance(at_most, float):
at_most = int(len(self) * at_most) # Note that it rounds down (floor)
def agent_generator(filter_func, agent_type, at_most):
count = 0
for agent in self:
if count >= at_most:
break
if (not filter_func or filter_func(agent)) and (
not agent_type or isinstance(agent, agent_type)
):
yield agent
count += 1
agents = agent_generator(filter_func, agent_type, at_most)
return AgentSet(agents, self.random) if not inplace else self._update(agents)

and

def select(
self,
filter_func: Callable[[T], bool] | None = None,
at_most: int | float = float("inf"),
):
"""Select cells based on filter function.
Args:
filter_func: filter function
at_most: The maximum amount of cells to select. Defaults to infinity.
- If an integer, at most the first number of matching cells is selected.
- If a float between 0 and 1, at most that fraction of original number of cells
Returns:
CellCollection
"""
if filter_func is None and at_most == float("inf"):
return self
if at_most <= 1.0 and isinstance(at_most, float):
at_most = int(len(self) * at_most) # Note that it rounds down (floor)
def cell_generator(filter_func, at_most):
count = 0
for cell in self:
if count >= at_most:
break
if not filter_func or filter_func(cell):
yield cell
count += 1
return CellCollection(cell_generator(filter_func, at_most), random=self.random)

Using inheritance might help reduce these duplicated code.

On a side note, CellCollection itself is a generic class. I guess there are different kinds of cells (unlike agents where user-defined agents subclass our mesa.Agent, so they're still of type mesa.Agent)?

class CellCollection(Generic[T]):

If this is the case, then we need to change the definition of CellCollection to something like

class CellCollection(Generic[T], CollectionBase[T]):
    """Cell-specific collection implementation"""
    pass

@quaquel
Copy link
Member

quaquel commented Feb 9, 2025

  1. The weakref stuff is probably agentset specific, and will make it hard to generalize e.g., select. Likewise, the inplace bool is primarily there for performance reasons related to weakrefs.
  2. The typing on CellCollection was done by @Corvince, so I am not fully shure about the reasoning behind it. I am not intimately familiar with the details of python typing yet. In my understanding, Generic[T] just means that CellCollection is a collection containing only instances of type T. I am not even sure we properly use the power of Generic elsewhere in the CellSpace code. At the moment, we only have a single Cell type, but the code is written such that users can implement and use their own Cell class as well.

@wang-boyu wang-boyu added the enhancement Release notes label label Feb 9, 2025
@wang-boyu
Copy link
Member

Thanks for the response!

  1. The weakref stuff is probably agentset specific, and will make it hard to generalize e.g., select. Likewise, the inplace bool is primarily there for performance reasons related to weakrefs.

This isn't really an issue for me - cells can be owned by a discrete space that has all cells:

self._cells: dict[tuple[int, ...], T] = {}

much like how agents are owned by a model:

self._agents = {} # the hard references to all agents in the model

Similarly layers are owned by HasPropertyLayers:

self._mesa_property_layers = {}

Then we can have the performance improvement for all collections.

  1. At the moment, we only have a single Cell type, but the code is written such that users can implement and use their own Cell class as well.

In this case probably CellCollection doesn't need to be a generic class. Instead, it could be a colleciton of the base Cell class, like how AgentSet is a collection of the base Agent class.

I'm not entirely sure about this though. Anyways even if we'd like to make the change, this could be in a separate issue/PR?

Any comments/ideas on this issue will be much appreciated! @EwoutH @Corvince

@quaquel
Copy link
Member

quaquel commented Feb 9, 2025

I don't follow your response to point 1. The use of weakrefs is a deliberate implementation choice in AgentSet that has knock-on consequences for all other methods in AgentSet. There is no compelling reason to use weakrefs elsewhere (e.g., for CellCollection). So, the amount of shared code is quite small to the point of being nonexistent and hence my suggestion to use a protocol.

@AdamZh0u
Copy link
Contributor Author

AdamZh0u commented Feb 10, 2025

Hi @quaquel, @wang-boyu, and @SongshGeo! Thank you for your comments.

It seems we have reached a consensus on enhancing the collection, but there is a point of disagreement on whether to use duck typing as a protocol or the CollectionBase class. I am considering three possible approaches:

  1. Duck typing.
  2. Using the CollectionBase class but without weakref.
  3. Using the CollectionBase class with weakref.

I have already implemented the approach 3, and I will also implement the other two approaches. I think we only need to run some benchmark experiments to compare the performance of these three approaches. I will proceed to complete these experiments, and we can decide on the next steps based on the results.

If you have any suggestions or guidance regarding the benchmark experiments, please let me know!

@Corvince
Copy link
Contributor

On a side note, CellCollection itself is a generic class. I guess there are different kinds of cells (unlike agents where user-defined agents subclass our mesa.Agent, so they're still of type mesa.Agent)?

mesa/mesa/experimental/cell_space/cell_collection.py

Line 32 in ba5104f

class CellCollection(Generic[T]):
If this is the case, then we need to change the definition of CellCollection to something like

class CellCollection(Generic[T], CollectionBase[T]):
"""Cell-specific collection implementation"""
pass

@quaquel already got it right, but yes DiscreteSpace allows using MyCell instead of Cell and the generic aims to represent that in types (althought not 100% there). But for the CollectionBase it should not matter and complicate things, I think this should be reflected already in CollectionBase[Cell] vs CollectionBase[MyCell]

@Corvince
Copy link
Contributor

Do we need a full class hierarchy for this, or can we use duck-typing instead? That is, it might be easier to agree on a shared interface across the collection types and standardize this as a protocol

I would lean towards protocol, although I agree it depends on how similar the code is. Select seems pretty similar, but there are already small differences (inplace), so I am unsure if we can really consolidated every use case in a single method. Might get ugly around the corners. Duck typing keeps things nicely separated, where differences are present (already visible with weakref/no-weakref). Code can also be duplicated by calling a shared function.

@wang-boyu
Copy link
Member

Thanks @Corvince. It would probably be easier to try duck typing (e.g., through Protocol) first? Then we can start adding methods into CellCollection (it only has select() for now) and check whether it's feasible to be merged with AgentSet into a generic CollectionBase from there.

I'm thinking perhaps it's possible to have a generic CollectionBase even if we want both weak ref and regular dictionary in it, by having some kind of weak_ref: bool = True or False switch when initializing it. But it may be too early to tell and we can revisit later.

@quaquel
Copy link
Member

quaquel commented Feb 10, 2025

I like this pragmatic way forward. Let's standardize and formalize the interface first via a protocol. That should be pretty straightforward because I have tried to keep CellCollection and AgentSet similar and have ported api ideas from one to the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Release notes label
Projects
None yet
Development

No branches or pull requests

5 participants