Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New alias system towards a more Pythonic interface #3239

Open
seisman opened this issue May 9, 2024 · 6 comments
Open

New alias system towards a more Pythonic interface #3239

seisman opened this issue May 9, 2024 · 6 comments
Labels
discussions Need more discussion before taking further actions

Comments

@seisman
Copy link
Member

seisman commented May 9, 2024

GMT's single-letter options (e.g. -B) are difficult to read/understand, so they are not recommended for use in PyGMT. Instead, PyGMT uses long-form parameters and the PyGMT alias system is responsible for translating PyGMT long-form parameters into the corresponding short-form GMT options. The alias system was originally implemented by @leouieda in aad12e0 (seven years ago!) and hasn't changed much since then. The alias system has some limitations and flaws that prevent us from achieving the project goal: "Build a Pythonic API for GMT". Now it's time to design a new alias system. This issue reviews the current alias system and proposes a new alias system.

The current alias system

Here, we review the current alias system and discuss its limitation and flaws.

Details

Currently, the alias system looks like this:

@fmt_docstrings
@use_alias(
    R="region",
    B="frame",
    J="projection",
)
@kwargs_to_strings(R="sequence")
def func(self, **kwargs):
    with Session() as lib:
        lib.call_module("basemap", args=build_arg_list(kwargs))

The current alias system works in this way:

  1. The kwargs_to_string decorator converts an argument to a string. The argument can be a string, a numeric value, or a sequence (e.g., converting region=[10, 20, 30, 40] to region="10/20/30/40").
  2. The use_alias decorator maps long-form PyGMT parameters (e.g, region) to short-form GMT options (e.g., R). The short-form options are then stored in kwargs (i.e., converting region="10/20/30/40" to kwargs["R"]="10/20/30/40".
  3. build_arg_list (previously build_arg_string) converts the dictionary kwargs to a list/string that GMT API can take.

The current alias system has some known limitations and flaws:

  1. Long arguments are difficult to read/write.

    Since each GMT option usually has many modifiers, some arguments are very long and no tab autocompletion is possible.

    Here is an example from What's the most Pythonic way for long GMT arguments? #1082:

    fig.logo(position="jTR+o0.3c/0.6c+w3c", box="+p1p+glightblue")
    

    The parameter names position and box are good, but their arguments are difficult to write/read. In What's the most Pythonic way for long GMT arguments? #1082, some candidate solutions (dict, class or function) were proposed. Please refer to What's the most Pythonic way for long GMT arguments? #1082 for detailed discussions.

  2. Short arguments are easy to write but difficult to read

    For some options, GMT uses single-letter arguments. Here are two examples:

    1. Figure.coast, resolution="f" is not readable. resolution="full" is more Pythonic
    2. pygmt.binstats, statistic="z" is not readable. statstic="sum" is more Pythonic.

    To support Pythonic long-form arguments, we can use a dictionary which maps long-form arguments to short-form arguments. In the current alias system, it means a lot of coding effort, see POC: pygmt.binstats: Make the 'statistic' parameter more Pythonic #3012 and Figure.coast/pygmt.select/pygmt.grdlandmask: Use long names ("crude"/"low"/"intermediate"/"high"/"full") for the 'resolution' parameter #3013.

  3. Abuse of the kwargs parameter.

    Short-form GMT options are stored in the keyword argument kwargs, so it must be the last parameter for all wrappers that use the alias system.

  4. Can't access the original argument by the long-form parameter name inside the wrappers

    The alias system is implemented as decorators, so all conversions/mappings are done outside of the wrappers. It means we can't access the original argument by the long-form parameter name in the wrappers.

    For example, in Figure.plot, S is aliased to style. To access the argument of style, we have to use kwargs.get("S").

    Another example is, region=[10, 20, 30, 40] is converted to kwargs["R"]="10/20/30/40". If we want to get the region bounds in the wrapper, we have to do the inversed conversion: w, e, s, n = kwargs["R"].split("/").

  5. Difficult to implement Pythonic high-level wrappers

    Due to the design of the GMT modules, each GMT module usually does too man things. For example, basemap/coast provide exactly the same option for adding scale bar, direction rose, and magnetic rose. In Higher-level plotting methods for scale bar, map direction rose and magnetic rose? #2831, we proposed to provide high-level wrappers that do a single job. These high-level wrappers should have a Pythonic interface with many long-form parameters (see Higher-level plotting methods for scale bar, map direction rose and magnetic rose? #2831 for the proposed API) but it's unclear how to translate so many parameters into GMT short-form options (we can but it usually means a lot of if-else tests, e.g., Add Pythonic argument options for colorbar frame parameters #2130).

    Another related issue is Higher-level plotting methods for Figure.plot and Figure.plot3d #2797 for high-level wrappers of plot and plot3d.

The new alias system version 1

Here, I propose a new alias system after half a year of design and coding (design takes more time than coding!). The new alias system is implemented in pygmt/alias.py of PR #3238.

This alias system is superseded by another improved alias system. This alias system is called v1 and the POC implemention is available in the alias-system-v1 branch. The new alias system version 2 is discussed in the "The new alias system version 2" section below.

Details

The Alias class

The Alias class defines how to convert the argument of a long-form parameter name to a string (or a sequence of strings) that can be passed to GMT API.

In the example below, we define a parameter offset. Its value can be a number, a string, or a sequence, or any object that the string representation (__str__) makes sense to GMT. If a sequence is given, the sequence will be joined into a string by the separator '/'. The prefix +o will also be added at the beginning of the string.

>>> from pygmt.alias import Alias
>>> par = Alias("offset", prefix="+o", separator="/")
>>> par.value = (2.0, 2.0)
>>> par.value
'+o2.0/2.0'

The Alias class has the value property, which is implemented using the setter method. So the argument is converted when Alias.value is assigned.

Here are more examples:

>>> from pygmt.alias import Alias
>>> par = Alias("frame")
>>> par.value = ("xaf", "yaf", "WSen")
>>> par.value
['xaf', 'yaf', 'WSen']

>>> par = Alias("resolution", mapping=True)
>>> par.value = "full"
>>> par.value
'f'

>>> par = Alias("statistic", mapping={"mean": "a", "mad": "d", "rms": "r", "sum": "z"})
>>> par.value = "mean"
>>> par.value
'a'

The AliasSystem class

The AliasSystem class is similar to the old use_alias decorator, which aliases GMT single-letter options to a Alias object or a list of Alias objectsn.

Here is an example:

>>> def func(par0, par1=None, par2=None, par3=None, par4=None, frame=False, panel=None, **kwargs):
...     alias = AliasSystem(
...         A=[
...             Alias("par1"),
...             Alias("par2", prefix="+j"),
...             Alias("par3", prefix="+o", separator="/"),
...         ],
...         B=Alias("frame"),
...         c=Alias("panel", separator=","),
...     )
...     return build_arg_list(alias.kwdict)
...
>>> func("infile", par1="mytext", par3=(12, 12), frame=True, panel=(1, 2), J="X10c/10c")
['-Amytext+o12/12', '-B', '-JX10c/10c', '-c1,2']

In this example, A is mapped to a list of Alias objesct. So, arguments of par1/par2/par3 will be used to build the -A option (e.g., par1="mytext", par3=(12, 12) is converted to kwdict["A"]="mytext+o12/12"). It means now we can break any complicated GMT option into multiple long-form parameters.

The AliasSystem class provides the property kwdict which is a dictionary with single-letter options as keys and string/sequence as values. It can be passed directly to the build_arg_list function. The kwdict dictionary is dynamically calculated from the current values of long-form parameters. In this way, we can always access the original values of parameters by long-form parameter names and even make changes to them before accessing alias.kwdict property.

The BaseParam class for common parameters

As discussed in #1082, for some options, it makes more sense to define a class to avoid having too many (potentially conflicting) parameter names.

With the help of the Alias system, the BaseParam implementation is easy. Users won't use the BaseParam class but we developers can use it to create new classes in a few lines without much coding effort (So adding new classes can be marked as "good-first-issue"!).

The Box class

In pygmt/params/box.py, I've implemented the Box class as an example. The box parameter is commonly used for plotting scale bar, color bar, gmt logo, images, inset, and more. So it makes sense to have a Box class.

Below is the definition of the Box class. To define a class for a parameter, we just need to define some fields (e.g., clearance/fill), and the special field _aliases, which is a list of Alias object.

@dataclass(repr=False)
class Box(BaseParam):
    """
    Docstrings.
    """

    clearance: float | str | Sequence[float | str] | None = None
    fill: str | None = None
    innerborder: str | Sequence | None = None
    pen: str | None = None
    radius: float | bool | None = False
    shading: str | Sequence | None = None

    _aliases: ClassVar = [
        Alias("clearance", prefix="+c", separator="/"),
        Alias("fill", prefix="+g"),
        Alias("innerborder", prefix="+i", separator="/"),
        Alias("pen", prefix="+p"),
        Alias("radius", prefix="+r"),
        Alias("shading", prefix="+s", separator="/"),
    ]

Here is an example. Please refer to the docstrings for more examples.

>>> str(Box(clearance=(0.1, 0.2, 0.3, 0.4), pen="blue", radius="10p"))
'+c0.1/0.2/0.3/0.4+pblue+r10p'

It's important to know that the Box class supports autocompletion!

The Frame/Axes/Axis classes

The -B option is one of the most complicated GMT options. It can repeat multiple times in GMT CLI, making it more complicated to support in Python.

In pygmt/params/frame.py, the Frame/Axes/Axis classes are implemented to address one of our oldest issues #249.

The technical details don't matter much. Here is an example use:

>>> import pygmt
>>> from pygmt.params import Frame, Axes, Axis

>>> fig = pygmt.Figure()
>>> # define a Frame object
>>> frame = Frame(
...     axes=Axes("WSen", title="My Plot Title", fill="lightred"),
...     xaxis=Axis(10, angle=30, label="X axis", unit="km"),
...     yaxis=Axis(20, label="Y axis")
... )
>>> fig.basemap(region=[0, 80, -30, 30], projection="X10c", frame=frame)
>>> fig.show()

Check out PR #3238 and try it yourself! Enjoy autocompletion!

Pros/Cons of the new alias system

Pros:

  1. The new and old alias systems can co-exist. So we don't have to migrate all wrapper in a single PR.
  2. Allow building a GMT option argument from multiple PyGMT parameters (More Pythonic)
  3. No abuse of kwargs anymore
  4. Define new parameter classes in a simple way
  5. Access the original argument by parameter name, not by dict lookup like kwargs.get("S") (Maybe faster)
  6. Autocompletion for parameter classes like Box/Frame
  7. Autocompletion of all function parameters after Refactor the function definitions to better support type hints and auto-completion #2896.
  8. Autocompletion for long-form arguments if we add type hints.

Cons:

  1. Big refactors may introduce new bugs. [We can always fix them if any.]
  2. The placeholder {aliases} in docstrings is not supported in the new alias system. [The list of aliases are not needed if we write good documentation.]

The new alias system version 2

The new alias system version 2 is an improved version of the version 1. The main difference is the Alias class, so please read the section for version 1 first. The POC implemention of the new alias system version 2 is available at #3238.

Details

In version 1, the Alias class is used like below:

>>> par = Alias("offset", prefix="+o", separator="/")
>>> par.value = (2.0, 2.0)
>>> par.value
'+o2.0/2.0'

It defines an alias for the parameter offset. To access the argument/value of the parameter, we need to either use the inspect module to access the local variables in the module wrappers (e.g.,

p_locals = inspect.currentframe().f_back.f_locals
), or use getattr in classes like Box. That's not ideal.

In version 2, it's used like below:

>>> offset = (2.0, 2.0)
>>> par = Alias(offset, prefix="+o", separator="/")
>>> par.value
(2.0, 2.0)
>>> par._value
'+o2.0/2.0'

The argument is passed directly as the first argumnent of the Alias object. The original value is stored as par.value and the corresponding string (or sequence of strings) is stored as par._value.

This is another big refactor towards a Pythonic interface! Ping @GenericMappingTools/pygmt-maintainers for comments.

@seisman
Copy link
Member Author

seisman commented May 23, 2024

Ping @GenericMappingTools/pygmt-maintainers for comments and thoughts.

@weiji14
Copy link
Member

weiji14 commented May 23, 2024

Thanks @seisman for opening up this for discussion. The Alias class you've implemented in #3238 seems to be meant for internal use (as a replacement for @use_alias), rather than something user-facing? I do like point 5 (Access the original argument by parameter name), which would help with simplifying the makeup of internal functions (especially high level functions in the pipeline), and moving away from @ decorators means users will see a cleaner traceback on errors.

I'll need more time to look into your implementation at #3238. My initial impression is that the implementation of Alias could be done as a first step in one PR, followed by the implementation of the Param class. I'm also wondering if this is a good time to bring in Pydantic to help with some validation logic based on type hints, essentially making syntax errors appear on the Python level rather than the GMT level (though that means PyGMT will need re-implement a lot of GMT's internal validation logic).

@seisman
Copy link
Member Author

seisman commented May 24, 2024

The Alias class you've implemented in #3238 seems to be meant for internal use (as a replacement for @use_alias), rather than something user-facing?

Yes.

I'm also wondering if this is a good time to bring in Pydantic to help with some validation logic based on type hints, essentially making syntax errors appear on the Python level rather than the GMT level

It looks worth a try.

@seisman seisman unpinned this issue Jul 30, 2024
@seisman seisman pinned this issue Jan 3, 2025
@seisman seisman unpinned this issue Jan 9, 2025
@seisman seisman pinned this issue Feb 25, 2025
@seisman
Copy link
Member Author

seisman commented Apr 8, 2025

@GenericMappingTools/pygmt-maintainers

The new alias system proposed here is likely the most significant refactoring since the start of the project. The initial implementation of the new alias system v2 is available in PR #3238, and I believe it's already in good shape and ready for an initial review.

As PR #3238 serves primarily as a proof of concept, please focus your feedback on how the new alias system enhances the maintainer experience towards making a Pythonic PyGMT, as well as how it improves the user experience with more Pythonic parameters/arguments. After the first round of review, we plan to break the POC PR into several smaller, focused PRs for formal review.

@yvonnefroehlich
Copy link
Member

yvonnefroehlich commented Apr 10, 2025

@GenericMappingTools/pygmt-maintainers

The new alias system proposed here is likely the most significant refactoring since the start of the project. The initial implementation of the new alias system v2 is available in PR #3238, and I believe it's already in good shape and ready for an initial review.

As PR #3238 serves primarily as a proof of concept, please focus your feedback on how the new alias system enhances the maintainer experience towards making a Pythonic PyGMT, as well as how it improves the user experience with more Pythonic parameters/arguments. After the first round of review, we plan to break the POC PR into several smaller, focused PRs for formal review.

This looks like one of the most siginifcant changes sofar! Within the next weeks or months, I will be unfortunately quite busy with writing and completing my PhD. So, it’s very likely that I will not have much time before middle or end of July to look at larger changes in more detail. Is there a time range at which people actually like to start implementing the new alias system?

@seisman
Copy link
Member Author

seisman commented Apr 11, 2025

Within the next weeks or months, I will be unfortunately quite busy with writing and completing my PhD. So, it’s very likely that I will not have much time before middle or end of July to look at larger changes in more detail. Is there a time range at which people actually like to start implementing the new alias system?

No worries at all. Completing your PhD is definitely your top priority right now.

As for the alias system, there's no fixed schedule yet. The main motivations for pushing it forward is that (1) it makes PyGMT more Pythonic; (2) some new features and enhancements (e.g., #2831) are waiting for it; (3) I bet it's a must-have for the PyGMT paper.

PR #3238 can be split into multiple (perhaps around 10) smaller PRs to make reviews easier. Ideally, I'd prefer to have these PRs reviewed and merged within a single release cycle (i.e., focusing on it in a 3-month period) to avoid back-and-forth changes, especially since we're still experimenting with the design.

Alternatively, we could create a dedicated branch for the new alias system, merge the smaller PRs into that branch, and only merge it into main once things feel stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussions Need more discussion before taking further actions
Projects
Development

No branches or pull requests

3 participants