Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up method isoparser._parse_isodate_common by 15% in python_modules/dagster/dagster/_vendored/dateutil/parser/isoparser.py #63

Open
wants to merge 1 commit into
base: codeflash/optimize-remove_none_recursively-2024-06-26T09.20.53
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 25, 2024

📄 isoparser._parse_isodate_common() in python_modules/dagster/dagster/_vendored/dateutil/parser/isoparser.py

📈 Performance improved by 15% (0.15x faster)

⏱️ Runtime went down from 19.5 microseconds to 17.0 microseconds

Explanation and details

To optimize the program for minimizing runtime and memory usage, you can make several changes.

  1. Avoid redundant checks and calculations.
  2. Use slicing more effectively.
  3. Optimize the logic flow to minimize branching.

Here's the improved version.

Summary of changes.

  1. Replaced the components list with direct assignments to y, m, and d.
  2. Removed unnecessary checks and minimized the number of string slices.
  3. Made use of boolean values directly in positional increments.
  4. Optimized logic flow to minimize branches and the number of times the string is parsed.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 15 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import pytest  # used for our unit tests
from dagster._vendored.dateutil.parser.isoparser import isoparser


# unit tests
def test_valid_iso_date_strings_without_separators():
    parser = isoparser()
    assert parser._parse_isodate_common("20230101") == ([2023, 1, 1], 8)
    assert parser._parse_isodate_common("19991231") == ([1999, 12, 31], 8)
    assert parser._parse_isodate_common("00010101") == ([1, 1, 1], 8)

def test_valid_iso_date_strings_with_separators():
    parser = isoparser(sep='-')
    assert parser._parse_isodate_common("2023-01-01") == ([2023, 1, 1], 10)
    assert parser._parse_isodate_common("1999-12-31") == ([1999, 12, 31], 10)
    assert parser._parse_isodate_common("0001-01-01") == ([1, 1, 1], 10)

def test_valid_iso_date_strings_with_custom_separators():
    parser = isoparser(sep='/')
    assert parser._parse_isodate_common("2023/01/01") == ([2023, 1, 1], 10)
    parser = isoparser(sep='.')
    assert parser._parse_isodate_common("1999.12.31") == ([1999, 12, 31], 10)
    parser = isoparser(sep='*')
    assert parser._parse_isodate_common("0001*01*01") == ([1, 1, 1], 10)

def test_short_strings():
    parser = isoparser()
    with pytest.raises(ValueError, match="ISO string too short"):
        parser._parse_isodate_common("202")
    with pytest.raises(ValueError, match="Invalid common month"):
        parser._parse_isodate_common("2023")
    with pytest.raises(ValueError, match="Invalid common month"):
        parser._parse_isodate_common("2023-0")

def test_invalid_separators():
    parser = isoparser(sep='-')
    with pytest.raises(ValueError, match="Invalid separator in ISO string"):
        parser._parse_isodate_common("2023-01/01")
    with pytest.raises(ValueError, match="Invalid ISO format"):
        parser._parse_isodate_common("2023-0101")
    with pytest.raises(ValueError, match="Invalid separator in ISO string"):
        parser._parse_isodate_common("2023/01-01")

def test_non_numeric_characters():
    parser = isoparser()
    with pytest.raises(ValueError):
        parser._parse_isodate_common("202A0101")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-0B-01")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-01-0C")

def test_out_of_range_components():
    parser = isoparser()
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-13-01")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-00-01")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-01-32")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023-01-00")

def test_leap_years():
    parser = isoparser(sep='-')
    assert parser._parse_isodate_common("2020-02-29") == ([2020, 2, 29], 10)
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2019-02-29")

def test_empty_string():
    parser = isoparser()
    with pytest.raises(ValueError, match="ISO string too short"):
        parser._parse_isodate_common("")

def test_whitespace_handling():
    parser = isoparser(sep='-')
    with pytest.raises(ValueError):
        parser._parse_isodate_common(" 2023-01-01 ")
    with pytest.raises(ValueError):
        parser._parse_isodate_common("2023- 01-01")

def test_performance_large_data_samples():
    parser = isoparser()
    large_valid_string = "20230101" * 1000
    large_invalid_string = "2023010" * 1000
    # Testing performance with large valid string
    for i in range(0, len(large_valid_string), 8):
        assert parser._parse_isodate_common(large_valid_string[i:i+8]) == ([2023, 1, 1], 8)
    # Testing performance with large invalid string
    for i in range(0, len(large_invalid_string), 7):
        with pytest.raises(ValueError):
            parser._parse_isodate_common(large_invalid_string[i:i+7])

def test_custom_separator_edge_cases():
    with pytest.raises(ValueError):
        isoparser(sep='1')
    with pytest.raises(ValueError):
        isoparser(sep='--')
    with pytest.raises(ValueError):
        isoparser(sep='é')

def test_boundary_year_component():
    parser = isoparser(sep='-')
    assert parser._parse_isodate_common("9999-12-31") == ([9999, 12, 31], 10)
    assert parser._parse_isodate_common("0000-01-01") == ([0, 1, 1], 10)

def test_boundary_month_day_components():
    parser = isoparser(sep='-')
    assert parser._parse_isodate_common("2023-01-01") == ([2023, 1, 1], 10)
    assert parser._parse_isodate_common("2023-12-31") == ([2023, 12, 31], 10)

🔘 (none found) − ⏪ Replay Tests

To optimize the program for minimizing runtime and memory usage, you can make several changes.

1. Avoid redundant checks and calculations.
2. Use slicing more effectively.
3. Optimize the logic flow to minimize branching.

Here's the improved version.



### Summary of changes.
1. Replaced the `components` list with direct assignments to `y`, `m`, and `d`.
2. Removed unnecessary checks and minimized the number of string slices.
3. Made use of boolean values directly in positional increments.
4. Optimized logic flow to minimize branches and the number of times the string is parsed.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants