Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo7 upgrade #140

Open
wants to merge 75 commits into
base: develop
Choose a base branch
from
Open

Mongo7 upgrade #140

wants to merge 75 commits into from

Conversation

Xiangs18
Copy link
Collaborator

@Xiangs18 Xiangs18 commented Dec 17, 2024

This PR does the following:

  • Removed all submodules (jars, kbapi_common, nms).
  • Added the NMS Docker service for local testing.
  • Fixed a bug in mock_auth/servicer.py.
  • Updated the catalog Docker build script.
  • Implemented lazy loading for MongoDB.
  • Upgraded mongo and Cleaned up files in the repository.

Note: Next PR will add retryWrites and update release notes.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 95.81152% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.95%. Comparing base (dd630ce) to head (26d8910).

Files with missing lines Patch % Lines
lib/biokbase/catalog/db.py 95.81% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #140      +/-   ##
===========================================
+ Coverage    80.14%   80.95%   +0.80%     
===========================================
  Files            8        8              
  Lines         2695     2846     +151     
===========================================
+ Hits          2160     2304     +144     
- Misses         535      542       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"""Create indexes for the given collection lazily."""
collection = self.db[collection_name]

# Get the indexes for the collection from the DBIndexes class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all of this code can go into the db_indexes.py code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean line 242?

Copy link
Collaborator Author

@Xiangs18 Xiangs18 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the entire thing, it makes more sense to me to separate the steps of getting and creating indexes.

Copy link
Collaborator

@bio-boris bio-boris Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. What I am suggesting is the code that creates the indexes be encapsulated in the DBIndexes class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look better now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more elif. Call _create_indexes function in __init__

Copy link
Collaborator

@bio-boris bio-boris Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original way it looked was much better. The way before elif was introduced, in which it declaratively did each function call and is less confusing then the way it is now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are reverting to the previous method because lazy Mongo load is no longer needed. It is done.

https://github.com/kbase/catalog/blob/dev-mongo7_upgrade/lib/biokbase/catalog/db.py#L210

@@ -0,0 +1,132 @@
from pymongo import ASCENDING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange to create indexes based on a specific collection name being rather than just create them in one go.

Here's one way to do it

from pymongo import ASCENDING
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class DBIndexes:
    _INDEX_MAP = {
        "module_versions": [
            {"fields": ["module_name_lc"], "unique": False, "sparse": False},
            {"fields": ["git_commit_hash"], "unique": False, "sparse": False},
            {"fields": ["module_name_lc", "git_commit_hash"], "unique": True, "sparse": False},
        ],
        "local_functions": [
            {"fields": ["function_id"], "unique": False, "sparse": False},
            {"fields": ["module_name_lc"], "unique": False, "sparse": False},
            {"fields": ["git_commit_hash"], "unique": False, "sparse": False},
            {"fields": ["module_name_lc", "function_id", "git_commit_hash"], "unique": True, "sparse": False},
        ],
        "developers": [
            {"fields": ["kb_username"], "unique": True, "sparse": False},
        ],
        "build_logs": [
            {"fields": ["registration_id"], "unique": True, "sparse": False},
            {"fields": ["module_name_lc"], "unique": False, "sparse": False},
            {"fields": ["timestamp"], "unique": False, "sparse": False},
            {"fields": ["registration"], "unique": False, "sparse": False},
            {"fields": ["git_url"], "unique": False, "sparse": False},
            {"fields": ["current_versions.release.release_timestamp"], "unique": False, "sparse": False},
        ],
        "favorites": [
            {"fields": ["user"], "unique": False, "sparse": False},
            {"fields": ["module_name_lc"], "unique": False, "sparse": False},
            {"fields": ["id"], "unique": False, "sparse": False},
            {"fields": ["user", "id", "module_name_lc"], "unique": True, "sparse": False},
        ],
        "exec_stats_raw": [
            {"fields": ["user_id"], "unique": False, "sparse": False},
            {"fields": ["app_module_name", "app_id"], "unique": False, "sparse": True},
            {"fields": ["func_module_name", "func_name"], "unique": False, "sparse": True},
            {"fields": ["creation_time"], "unique": False, "sparse": False},
            {"fields": ["finish_time"], "unique": False, "sparse": False},
        ],
        "exec_stats_apps": [
            {"fields": ["module_name"], "unique": False, "sparse": True},
            {"fields": ["full_app_id", "type", "time_range"], "unique": True, "sparse": False},
            {"fields": ["type", "time_range"], "unique": False, "sparse": True},
        ],
        "exec_stats_users": [
            {"fields": ["user_id", "type", "time_range"], "unique": True, "sparse": False},
        ],
        "client_groups": [
            {"fields": ["module_name_lc", "function_name"], "unique": True, "sparse": False},
        ],
        "volume_mounts": [
            {"fields": ["client_group", "module_name_lc", "function_name"], "unique": True, "sparse": False},
        ],
        "secure_config_params": [
            {"fields": ["module_name_lc"], "unique": False, "sparse": False},
            {"fields": ["module_name_lc", "version", "param_name"], "unique": True, "sparse": False},
        ],
    }

    @classmethod
    def get_indexes(cls, collection_name):
        return cls._INDEX_MAP.get(collection_name, [])

    @staticmethod
    def create_indexes(collection, indexes):
        index_definitions = [
            {
                "key": [(field, ASCENDING) for field in index["fields"]],
                "unique": index.get("unique", False),
                "sparse": index.get("sparse", False),
            }
            for index in indexes
        ]
        
        collection.create_indexes(index_definitions)
        logger.info(f"Created indexes: {index_definitions}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, why did this code even get changed to this instead of keeping the original code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ec3064b Looks like the code was fine before but a bunch of else ifs were added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did everything in one go, rather than doing things by a specific collection.

Copy link
Collaborator Author

@Xiangs18 Xiangs18 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original Mongo initialization is in init, and it created the collection/index in a single operation. When we switched to lazy Mongo loading, the collection/index was created only once on their first call, which is why there are multiple 'elif' statements.

Now we are switching to #140 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping in the "declarative" and straightforward style that it was before would make this code less confusing. In addition, this can be run as a script before the server starts up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call _create_indexes function in init

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are saying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no longer a db_indexes.py file. This thread is no longer relevant.

- method_spec_temp_dir=narrative_method_store_temp
- method_spec_mongo_host=mongo:27017
- method_spec_mongo_dbname=method_store_repo_db
- method_spec_admin_users=${ADMIN_USER}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you expect a dev to deal with this when testing locally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing locally, either export the variable directly in the shell, or hardcode an admin user in the YAML file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a local dev know to do that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time dev is still going to have a hard time figuring out why the tests don't pass when they run them. Put yourself in their shoes - if you were running the tests for the first time, what would happen if you didn't know about this env var and how easy would it be to debug? How would you want this to work / be documented?

@Xiangs18
Copy link
Collaborator Author

Screenshot 2025-02-25 at 12 28 20 PM

Comment on lines +170 to +175
except ConnectionFailure as e:
error_msg = "Cannot connect to Mongo server\n"
error_msg += "ERROR -- {}:\n{}".format(
e, "".join(traceback.format_exception(None, e, e.__traceback__))
)
raise ValueError(error_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except ConnectionFailure as e:
error_msg = "Cannot connect to Mongo server\n"
error_msg += "ERROR -- {}:\n{}".format(
e, "".join(traceback.format_exception(None, e, e.__traceback__))
)
raise ValueError(error_msg)
except ConnectionFailure as e:
raise ValueError(f"Cannot connect to Mongo server: {e}") from e


def _initialize_mongo_client(self):
"""Initialize MongoDB client."""
# Use the lock to ensure only one thread initializes the mongo client at a time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is in the wrong place

Comment on lines -1324 to 1444
def update_db_1_to_2(self):
for m in self.modules.find({'release_versions': {'$exists': True}}):
def update_db_1_to_2(self, db):
modules_collection = db[MongoCatalogDBI._MODULES]
for m in modules_collection.find({'release_versions': {'$exists': True}}):
release_version_list = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we delete all these upgrade methods again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants