-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mongo7 upgrade #140
base: develop
Are you sure you want to change the base?
Mongo7 upgrade #140
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #140 +/- ##
===========================================
+ Coverage 80.14% 80.95% +0.80%
===========================================
Files 8 8
Lines 2695 2846 +151
===========================================
+ Hits 2160 2304 +144
- Misses 535 542 +7 ☔ View full report in Codecov by Sentry. |
lib/biokbase/catalog/db.py
Outdated
"""Create indexes for the given collection lazily.""" | ||
collection = self.db[collection_name] | ||
|
||
# Get the indexes for the collection from the DBIndexes class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that all of this code can go into the db_indexes.py code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean line 242?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the entire thing, it makes more sense to me to separate the steps of getting and creating indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. What I am suggesting is the code that creates the indexes be encapsulated in the DBIndexes class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it look better now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no more elif
. Call _create_indexes function in __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original way it looked was much better. The way before elif was introduced, in which it declaratively did each function call and is less confusing then the way it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are reverting to the previous method because lazy Mongo load is no longer needed. It is done.
https://github.com/kbase/catalog/blob/dev-mongo7_upgrade/lib/biokbase/catalog/db.py#L210
lib/biokbase/catalog/db_indexes.py
Outdated
@@ -0,0 +1,132 @@ | |||
from pymongo import ASCENDING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit strange to create indexes based on a specific collection name being rather than just create them in one go.
Here's one way to do it
from pymongo import ASCENDING
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class DBIndexes:
_INDEX_MAP = {
"module_versions": [
{"fields": ["module_name_lc"], "unique": False, "sparse": False},
{"fields": ["git_commit_hash"], "unique": False, "sparse": False},
{"fields": ["module_name_lc", "git_commit_hash"], "unique": True, "sparse": False},
],
"local_functions": [
{"fields": ["function_id"], "unique": False, "sparse": False},
{"fields": ["module_name_lc"], "unique": False, "sparse": False},
{"fields": ["git_commit_hash"], "unique": False, "sparse": False},
{"fields": ["module_name_lc", "function_id", "git_commit_hash"], "unique": True, "sparse": False},
],
"developers": [
{"fields": ["kb_username"], "unique": True, "sparse": False},
],
"build_logs": [
{"fields": ["registration_id"], "unique": True, "sparse": False},
{"fields": ["module_name_lc"], "unique": False, "sparse": False},
{"fields": ["timestamp"], "unique": False, "sparse": False},
{"fields": ["registration"], "unique": False, "sparse": False},
{"fields": ["git_url"], "unique": False, "sparse": False},
{"fields": ["current_versions.release.release_timestamp"], "unique": False, "sparse": False},
],
"favorites": [
{"fields": ["user"], "unique": False, "sparse": False},
{"fields": ["module_name_lc"], "unique": False, "sparse": False},
{"fields": ["id"], "unique": False, "sparse": False},
{"fields": ["user", "id", "module_name_lc"], "unique": True, "sparse": False},
],
"exec_stats_raw": [
{"fields": ["user_id"], "unique": False, "sparse": False},
{"fields": ["app_module_name", "app_id"], "unique": False, "sparse": True},
{"fields": ["func_module_name", "func_name"], "unique": False, "sparse": True},
{"fields": ["creation_time"], "unique": False, "sparse": False},
{"fields": ["finish_time"], "unique": False, "sparse": False},
],
"exec_stats_apps": [
{"fields": ["module_name"], "unique": False, "sparse": True},
{"fields": ["full_app_id", "type", "time_range"], "unique": True, "sparse": False},
{"fields": ["type", "time_range"], "unique": False, "sparse": True},
],
"exec_stats_users": [
{"fields": ["user_id", "type", "time_range"], "unique": True, "sparse": False},
],
"client_groups": [
{"fields": ["module_name_lc", "function_name"], "unique": True, "sparse": False},
],
"volume_mounts": [
{"fields": ["client_group", "module_name_lc", "function_name"], "unique": True, "sparse": False},
],
"secure_config_params": [
{"fields": ["module_name_lc"], "unique": False, "sparse": False},
{"fields": ["module_name_lc", "version", "param_name"], "unique": True, "sparse": False},
],
}
@classmethod
def get_indexes(cls, collection_name):
return cls._INDEX_MAP.get(collection_name, [])
@staticmethod
def create_indexes(collection, indexes):
index_definitions = [
{
"key": [(field, ASCENDING) for field in index["fields"]],
"unique": index.get("unique", False),
"sparse": index.get("sparse", False),
}
for index in indexes
]
collection.create_indexes(index_definitions)
logger.info(f"Created indexes: {index_definitions}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget, why did this code even get changed to this instead of keeping the original code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec3064b Looks like the code was fine before but a bunch of else ifs were added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did everything in one go, rather than doing things by a specific collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original Mongo initialization is in init, and it created the collection/index in a single operation. When we switched to lazy Mongo loading, the collection/index was created only once on their first call, which is why there are multiple 'elif' statements.
Now we are switching to #140 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping in the "declarative" and straightforward style that it was before would make this code less confusing. In addition, this can be run as a script before the server starts up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call _create_indexes function in init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you are saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no longer a db_indexes.py file. This thread is no longer relevant.
- method_spec_temp_dir=narrative_method_store_temp | ||
- method_spec_mongo_host=mongo:27017 | ||
- method_spec_mongo_dbname=method_store_repo_db | ||
- method_spec_admin_users=${ADMIN_USER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you expect a dev to deal with this when testing locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing locally, either export the variable directly in the shell, or hardcode an admin user in the YAML file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a local dev know to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time dev is still going to have a hard time figuring out why the tests don't pass when they run them. Put yourself in their shoes - if you were running the tests for the first time, what would happen if you didn't know about this env var and how easy would it be to debug? How would you want this to work / be documented?
except ConnectionFailure as e: | ||
error_msg = "Cannot connect to Mongo server\n" | ||
error_msg += "ERROR -- {}:\n{}".format( | ||
e, "".join(traceback.format_exception(None, e, e.__traceback__)) | ||
) | ||
raise ValueError(error_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ConnectionFailure as e: | |
error_msg = "Cannot connect to Mongo server\n" | |
error_msg += "ERROR -- {}:\n{}".format( | |
e, "".join(traceback.format_exception(None, e, e.__traceback__)) | |
) | |
raise ValueError(error_msg) | |
except ConnectionFailure as e: | |
raise ValueError(f"Cannot connect to Mongo server: {e}") from e |
|
||
def _initialize_mongo_client(self): | ||
"""Initialize MongoDB client.""" | ||
# Use the lock to ensure only one thread initializes the mongo client at a time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is in the wrong place
def update_db_1_to_2(self): | ||
for m in self.modules.find({'release_versions': {'$exists': True}}): | ||
def update_db_1_to_2(self, db): | ||
modules_collection = db[MongoCatalogDBI._MODULES] | ||
for m in modules_collection.find({'release_versions': {'$exists': True}}): | ||
release_version_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we delete all these upgrade methods again?
This PR does the following:
Note: Next PR will add retryWrites and update release notes.