-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: add ability to reinit incoming migration object #4756
Changes from all commits
47d4403
5e47e28
6b264bb
000ed3e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -151,17 +151,8 @@ class ClusterShardMigration { | |
atomic_bool pause_ = false; | ||
}; | ||
|
||
IncomingSlotMigration::IncomingSlotMigration(string source_id, Service* se, SlotRanges slots, | ||
uint32_t shards_num) | ||
: source_id_(std::move(source_id)), | ||
service_(*se), | ||
slots_(std::move(slots)), | ||
state_(MigrationState::C_CONNECTING), | ||
bc_(shards_num) { | ||
shard_flows_.resize(shards_num); | ||
for (unsigned i = 0; i < shards_num; ++i) { | ||
shard_flows_[i].reset(new ClusterShardMigration(i, &service_, this, bc_)); | ||
} | ||
IncomingSlotMigration::IncomingSlotMigration(string source_id, Service* se, SlotRanges slots) | ||
: source_id_(std::move(source_id)), service_(*se), slots_(std::move(slots)), bc_(0) { | ||
} | ||
|
||
IncomingSlotMigration::~IncomingSlotMigration() { | ||
BorysTheDev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -203,7 +194,8 @@ bool IncomingSlotMigration::Join(long attempt) { | |
auto wait_res = bc_->WaitFor(wait_time); | ||
if (is_attempt_correct) { | ||
if (wait_res) { | ||
state_.store(MigrationState::C_FINISHED); | ||
util::fb2::LockGuard lk(state_mu_); | ||
state_ = MigrationState::C_FINISHED; | ||
keys_number_ = cluster::GetKeyCount(slots_); | ||
} else { | ||
LOG(WARNING) << "Can't join migration because of data after LSN for " << source_id_; | ||
|
@@ -215,7 +207,8 @@ bool IncomingSlotMigration::Join(long attempt) { | |
} | ||
|
||
void IncomingSlotMigration::Stop() { | ||
string_view log_state = state_.load() == MigrationState::C_FINISHED ? "Finishing" : "Cancelling"; | ||
util::fb2::LockGuard lk(state_mu_); | ||
string_view log_state = state_ == MigrationState::C_FINISHED ? "Finishing" : "Cancelling"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also when the migration is stopped can we have some state that will show that the migration is stopped otherwise we will show the last state - sync/connecting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't want to. Adding additional states makes the code more complex; debugging commands should not add complexity. |
||
LOG(INFO) << log_state << " incoming migration of slots " << slots_.ToString(); | ||
cntx_.Cancel(); | ||
|
||
|
@@ -244,17 +237,31 @@ void IncomingSlotMigration::Stop() { | |
} | ||
} | ||
|
||
void IncomingSlotMigration::StartFlow(uint32_t shard, util::FiberSocketBase* source) { | ||
state_.store(MigrationState::C_SYNC); | ||
void IncomingSlotMigration::Init(uint32_t shards_num) { | ||
util::fb2::LockGuard lk(state_mu_); | ||
cntx_.Reset(nullptr); | ||
BorysTheDev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
state_ = MigrationState::C_SYNC; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this one should be connecting and start flow should be sync |
||
|
||
bc_ = BlockingCounter(shards_num); | ||
shard_flows_.resize(shards_num); | ||
for (unsigned i = 0; i < shards_num; ++i) { | ||
shard_flows_[i].reset(new ClusterShardMigration(i, &service_, this, bc_)); | ||
} | ||
} | ||
|
||
void IncomingSlotMigration::StartFlow(uint32_t shard, util::FiberSocketBase* source) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do we protect startflow running at the same time with Stop that can run from cluster config command? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ClusterShardMigration::Start and ClusterShardMigration::Cancel methods are synchronized via mutex, so it's safe to call them simultaneously. If the start was called before Cancel, it's normal flow, if Cancel was called before Start(), the latest just do return |
||
shard_flows_[shard]->Start(&cntx_, source); | ||
VLOG(1) << "Incoming flow " << shard << " finished for " << source_id_; | ||
} | ||
|
||
size_t IncomingSlotMigration::GetKeyCount() const { | ||
if (state_.load() == MigrationState::C_FINISHED) { | ||
return keys_number_; | ||
{ | ||
util::fb2::LockGuard lk(state_mu_); | ||
if (state_ == MigrationState::C_FINISHED) { | ||
return keys_number_; | ||
} | ||
} | ||
|
||
return cluster::GetKeyCount(slots_); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that in this new flow the state of the incoming migration once you create the object but do not establish connection yet between source and target should be PRE_CONNECT or something similar that will help us understand the state of migration if we run dflycluster migration status command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss migration states separately. I don't see how it can help us, but anycase, I think we need to make such changes for outgoing and incoming migration together so I will better create another PR.