Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to configure the io's r, w, l parameters? #65

Open
An-tian opened this issue Oct 24, 2018 · 5 comments
Open

how to configure the io's r, w, l parameters? #65

An-tian opened this issue Oct 24, 2018 · 5 comments

Comments

@An-tian
Copy link

An-tian commented Oct 24, 2018

I want to test dmclock in ceph cluester(luminous (mclock_client)). but I have two troubles
1, r,w,l parameters are based on iops. the different block sizes correspond to different iops, hou do I config it?
2, I want to reduce the performance of client io by about 10% in the recovery scenario.
How can I adjust these parameters?
@ivancich

@GTHubT
Copy link

GTHubT commented Oct 24, 2018

@An-tian
As I know, the r, w, l parameters can not provide IOPS control, it just use for IO Schedule, if you need reduce IOPS, you should use other method, such as token bucket.

@An-tian
Copy link
Author

An-tian commented Oct 24, 2018

I

@An-tian
As I know, the r, w, l parameters can not provide IOPS control, it just use for IO Schedule, if you need reduce IOPS, you should use other method, such as token bucket.

I tested the iops with the rados tool.

  1. When only client_io and subop occur, rados tools return iops1。
    2 in the recovery scenario, rados tools return iops2.
    I means Iops2=iops1*90%
    Is my understanding wrong?

@GTHubT
Copy link

GTHubT commented Oct 24, 2018

also as i saied before, if your server IO performance = 10000. client_io r = 8000, recovery io r = 2000, the server will always povide client_io iops = 8000, recovery_iops = 2000, no matter the l and w parameters. if in recovery scenario, to achieve the performance as you saied 90%, you can change client_io r = 800090%, recovery_io r = 10000 - 800090%.

@ivancich
Copy link
Member

ivancich commented Oct 26, 2018

Please note -- this issue on the dmclock library, which is written to be generic, so that it could be used in other projects. At the moment, the ceph use of the library is limited. So I'll address the general issue and then the ceph-specific issue.

Generally
The three parameters reservation, weight, and limit can be used. Reservation and limit are in ops per second. Weight is primarily a scalar -- it has no units and weights are relative to one another. That said, according to the algorithm a weight tag can be advanced due to the current clock value. Therefore it's beneficial not to have weight values that are too large, such that 1/weight would be too small a time increment. In other words, having weights of 1 and 2 might be more effective than weights of 1,000,000,000 and 2,000,000,000.

Also, the latest version of the code allows a "cost" to be specified with an operation. That's supposed to be a measure of how long the operation would take to complete. So writes might have a larger cost than reads. And larger payloads (data-sizes) might have larger costs than smaller payloads.

Ceph-Specific (luminous)
The first thing to realize is that currently dmclock is an experimental feature in ceph, including luminous. We're working on making it work better and allow greater control.

With respect to the issue of cost mentioned above, the version of dmclock in ceph-luminous has an earlier version of cost that does not work particularly well. So that limits the effectiveness of dmclock in luminous.

Furthermore, all current versions of ceph do not strictly enforce the limit in dmclock. So when all ops with tags within the limit are retrieved, it will then look to ops with tags that are over the limit.

Also, there is a throttle between the dmclock op queue and the object store queue, and we're working on tuning that better, so dmclock can have greater influence on the scheduling.

And finally, we're primarily interested in changing the scheduling of different classes of operations. In luminous the five classes of operations are client_op, osd_subop, snap, recov, and scrub. So to get the result you're after, you might want to adjust the values of osd_op_queue_mclock_client_op_res, osd_op_queue_mclock_client_op_wgt, osd_op_queue_mclock_recov_res, and osd_op_queue_mclock_recov. But I suspect the influence will not be what you're hoping for.

@An-tian
Copy link
Author

An-tian commented Nov 26, 2018

@ivancich
thanks for your reply,I have already understood the framework of dmclock and the interface in ceph(mClockOpClassQueue.h && mClockPriorityQueue.h). I modified allow-limit_break = false and supported return the request of "future". I used the following configuration to test mclock_opclass:

osd_op_queue = mclock_opclass

osd_op_queue_mclock_client_op_lim = 0.000000
osd_op_queue_mclock_client_op_res = 200.000000
osd_op_queue_mclock_client_op_wgt = 100.000000
osd_op_queue_mclock_osd_subop_lim = 0.000000
osd_op_queue_mclock_osd_subop_res = 200.000000
osd_op_queue_mclock_osd_subop_wgt = 100.000000
osd_op_queue_mclock_recov_lim = 0.001000
osd_op_queue_mclock_recov_res = 10.000000
osd_op_queue_mclock_recov_wgt = 50.00000

osd_op_num_threads_per_shard_hdd = 10
osd_op_num_shards_hdd = 1

I want to make recovery_io as small as possible affecting client_io, but I found that dmclock doesn't seem to work. but by log observation it is still in line with expectations, I am very confused!
Are there other options to pay attention to???

>>With respect to the issue of cost mentioned above, the version of dmclock in ceph-luminous has an >>earlier version of cost that does not work particularly

I read the code for ceph 12.2.7( Passed as 0 in add_request) and ceph master(Passed as 1 in add_request), It seems that there is no difference after the test.

>>Also, there is a throttle between the dmclock op queue and the object store queue, and we're working >>on tuning that better, so dmclock can have greater influence on the scheduling.

What does the throttle mean? bluestore_throttle_bytes, bluestore_throttle_deferred_bytes and bluestore_throttle_cost_per_io_hdd? I adjusted them to smaller values. But it has no effect.

What is the root cause of dmclock not working in ceph?
Is it because recovery has seized the processing power of bluestore? or because the cost is not considered?

I look forward to your reply, thank you very much.

@An-tian An-tian closed this as completed Dec 6, 2018
@An-tian An-tian reopened this Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants