Skip to content

Commit da333e9

Browse files
leftwoAlan Hanson
and
Alan Hanson
authored
Added more DTrace scripts. (#1309)
Additional scripts, single_up_info.d and sled_upstairs_info.d to give you a specific script for a specific PID, or for an entire system. A new script, all_downstairs.d, to display IO stats for all downstairs running on a system. Updated the README single_up_info.d requires a PID and adds a SESSION column to tell apart different upstairs inside a single process. sled_upstairs_info.d has both a PID and a SESSION column and will print out stats for the whole system. Co-authored-by: Alan Hanson <[email protected]>
1 parent 8c6d485 commit da333e9

File tree

5 files changed

+369
-0
lines changed

5 files changed

+369
-0
lines changed

package-manifest.toml

+3
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,8 @@ source.paths = [
2222
{ from = "tools/dtrace/upstairs_raw.d", to = "/opt/oxide/dtrace/upstairs_raw.d" },
2323
{ from = "tools/dtrace/get-lr-state.sh", to = "/opt/oxide/dtrace/get-lr-state.sh" },
2424
{ from = "tools/dtrace/get-ds-state.sh", to = "/opt/oxide/dtrace/get-ds-state.sh" },
25+
{ from = "tools/dtrace/single_up_info.d", to = "/opt/oxide/dtrace/single_up_info.d" },
26+
{ from = "tools/dtrace/sled_upstairs_info.d", to = "/opt/oxide/dtrace/sled_upstairs_info.d" },
27+
{ from = "tools/dtrace/all_downstairs.d", to = "/opt/oxide/dtrace/all_downstairs.d" },
2528
]
2629
output.type = "zone"

tools/dtrace/README.md

+85
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,46 @@
11
# Oxide DTrace Crucible scripts
22

3+
## all_downstairs.d
4+
A DTrace script to show IOs coming and going on all downstairs as well as the
5+
work task geting new work, performing the work and completing the work. Stats
6+
are printed at a 4 second interval.
7+
8+
The columns show counts in the last 4 seconds of:
9+
F> Flush coming in from the upstairs
10+
F< Flush completed message being sent back to the upstairs.
11+
W> Write coming in from the upstairs
12+
W< Write completed message being sent back to the upstairs.
13+
R> Read coming in from the upstairs
14+
R< Read completed message being sent back to the upstairs.
15+
WS An IO has been submitted to the work task in the downstairs
16+
WIP An IO is taken off the work queue by the downstairs work task.
17+
WD An IO is completed by the downstairs work task.
18+
19+
If a downstairs has not done any IOs it will either print no line, or
20+
print a line of zeros.
21+
22+
```
23+
EVT22200005 # dtrace -s /alan/dtrace/all_downstairs.d
24+
PID F> F< W> W< R> R< WS WIP WD
25+
13790 10 9 1911 1835 0 0 1943 1867 1867
26+
25574 10 10 2204 2082 0 0 2237 2115 2114
27+
25442 10 10 2204 2089 0 0 2236 2122 2121
28+
PID F> F< W> W< R> R< WS WIP WD
29+
17147 2 2 0 0 389 389 391 391 391
30+
25492 2 2 0 0 389 389 391 391 391
31+
25627 2 2 0 0 389 389 391 391 391
32+
25442 10 9 2283 2177 0 0 2315 2207 2208
33+
25574 10 9 2283 2184 0 0 2314 2214 2215
34+
13790 10 10 2054 2030 0 0 2085 2061 2061
35+
PID F> F< W> W< R> R< WS WIP WD
36+
17147 2 2 2 2 0 0 4 4 4
37+
25492 2 2 2 2 0 0 4 4 4
38+
25627 2 2 2 2 0 0 4 4 4
39+
13790 10 10 1961 1985 0 0 1994 2018 2018
40+
25442 10 10 2042 2185 0 0 2074 2218 2217
41+
25574 10 10 2045 2185 0 0 2077 2218 2217
42+
```
43+
344
## downstairs_count.d
445
A DTrace script to show IOs coming and going on a downstairs as well as the
546
work task geting new work, performing the work and completing the work. This
@@ -339,6 +380,50 @@ Trace a downstairs IO and measure time for in in the following three parts:
339380
* 2nd report is OS time (for flush, to flush all extents)
340381
* 3rd report is OS done to downstairs sending the ACK back to upstairs
341382

383+
## single_up_info.d
384+
Similar to upstairs_info.d, this script prints out various counters in
385+
the upstairs. However, you specify a PID and it will display stats for
386+
only that PID. See upstairs_info.d for a description of the columns.
387+
388+
```
389+
EVT22200005 # dtrace -s single_up_info.d 15579
390+
SESSION DS STATE 0 DS STATE 1 DS STATE 2 UPW DSW NEXT_JOB BAKPR WRITE_BO NEW0 NEW1 NEW2 IP0 IP1 IP2 D0 D1 D2 S0 S1 S2 ER0 ER1 ER2 EC0 EC1 EC2
391+
c0b92059 live_repair active active 3 435 570215 2761 226492416 0 0 0 40 241 241 24 194 194 371 0 0 9384 0 0 0 0 0
392+
a666a8bd live_repair active active 2 3 90656 0 0 0 0 0 2 1 1 1 2 2 0 0 0 7561 0 0 11640 0 0
393+
a666a8bd live_repair active active 2 11 90664 0 0 0 0 0 2 1 1 9 10 10 0 0 0 7563 0 0 11640 0 0
394+
c0b92059 live_repair active active 3 514 570762 3111 237219840 0 0 0 67 234 234 33 280 280 414 0 0 9385 0 0 0 0 0
395+
c0b92059 live_repair active active 3 329 571129 2929 231735296 0 0 0 1 227 251 59 102 78 269 0 0 9386 0 0 0 0 0
396+
a666a8bd live_repair active active 2 19 90672 0 0 0 0 0 2 1 1 17 18 18 0 0 0 7565 0 0 11640 0 0
397+
c0b92059 live_repair active active 3 339 571544 512 127401984 0 0 0 1 139 137 54 200 202 284 0 0 9387 0 0 0 0 0
398+
a666a8bd live_repair active active 2 23 90676 0 0 0 0 0 2 1 1 21 22 22 0 0 0 7566 0 0 11640 0 0
399+
c0b92059 live_repair active active 3 389 572038 221 101711872 0 0 0 1 112 112 67 277 277 321 0 0 9388 0 0 0 0 0
400+
a666a8bd live_repair active active 2 31 90684 0 0 0 0 0 2 1 1 29 30 30 0 0 0 7568 0 0 11640 0 0
401+
```
402+
## sled_upstairs_info.d
403+
Similar to upstairs_info.d, this script prints out various counters in
404+
the upstairs for all process that have an upstairs running on the system.
405+
See upstairs_info.d for a description of the columns.
406+
This script adds a PID and a SESSION to identify which upstairs we are
407+
reporting stats for.
408+
409+
```
410+
EVT22200005 # dtrace -s sled_upstairs_info.d
411+
PID SESSION DS STATE 0 DS STATE 1 DS STATE 2 UPW DSW NEXT_JOB BAKPR WRITE_BO NEW0 NEW1 NEW2 IP0 IP1 IP2 D0 D1 D2 S0 S1 S2 ER0 ER1 ER2 EC0 EC1 EC2
412+
15579 c0b92059 live_repair active active 3 367 656347 1616 185597952 0 0 0 75 195 195 69 172 172 223 0 0 9589 0 0 0 0 0
413+
15579 a666a8bd live_repair active active 2 95 91960 0 0 0 0 0 2 1 1 93 94 94 0 0 0 7827 0 0 11667 0 0
414+
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
415+
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
416+
24948 79d92ceb active active active 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
417+
15579 c0b92059 live_repair active active 3 432 656863 2077 203423744 0 0 0 104 168 168 68 264 264 260 0 0 9590 0 0 0 0 0
418+
15579 a666a8bd live_repair active active 2 99 91964 0 0 0 0 0 2 1 1 97 98 98 0 0 0 7828 0 0 11667 0 0
419+
24948 127b8de5 new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
420+
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
421+
24948 79d92ceb active active active 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
422+
15579 c0b92059 live_repair active active 4 529 657227 4805 282066944 0 0 0 95 296 296 80 233 233 354 0 0 9591 0 0 0 0 0
423+
15579 a666a8bd live_repair active active 2 107 91972 0 0 0 0 0 2 1 1 105 106 106 0 0 0 7830 0 0 11667 0 0
424+
425+
```
426+
342427
## upstairs_action.d
343428
This is a dtrace script for printing the counts of the upstairs main action
344429
loop.

tools/dtrace/all_downstairs.d

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
#pragma D option quiet
2+
/*
3+
* Print IO counters for all running downstairs.
4+
*/
5+
crucible_downstairs*:::submit-flush-start
6+
{
7+
@sf_start[pid] = count();
8+
}
9+
10+
crucible_downstairs*:::submit-flush-done
11+
{
12+
@sf_done[pid] = count();
13+
}
14+
15+
crucible_downstairs*:::submit-write-start
16+
{
17+
@sw_start[pid] = count();
18+
}
19+
20+
crucible_downstairs*:::submit-write-done
21+
{
22+
@sw_done[pid] = count();
23+
}
24+
25+
crucible_downstairs*:::submit-read-start
26+
{
27+
@sr_start[pid] = count();
28+
}
29+
30+
crucible_downstairs*:::submit-read-done
31+
{
32+
@sr_done[pid] = count();
33+
}
34+
35+
crucible_downstairs*:::submit-writeunwritten-start
36+
{
37+
@swu_start[pid] = count();
38+
}
39+
40+
crucible_downstairs*:::submit-writeunwritten-done
41+
{
42+
@swu_done[pid] = count();
43+
}
44+
crucible_downstairs*:::work-start
45+
{
46+
@work_start[pid] = count();
47+
}
48+
crucible_downstairs*:::work-process
49+
{
50+
@work_process[pid] = count();
51+
}
52+
crucible_downstairs*:::work-done
53+
{
54+
@work_done[pid] = count();
55+
}
56+
57+
58+
tick-4s
59+
{
60+
printf("%5s %4s %4s %4s %4s %5s %5s %5s %5s %5s\n",
61+
"PID", "F>", "F<", "W>", "W<", "R>", "R<", "WS", "WIP", "WD");
62+
printa("%05d %@4u %@4u %@4u %@4u %@5u %@5u %@5u %@5u %@5u\n",
63+
@sf_start, @sf_done, @sw_start, @sw_done, @sr_start, @sr_done,
64+
@work_start, @work_process, @work_done
65+
);
66+
clear(@sf_start);
67+
clear(@sf_done);
68+
clear(@sw_start);
69+
clear(@sw_done);
70+
clear(@sr_start);
71+
clear(@sr_done);
72+
clear(@swu_start);
73+
clear(@swu_done);
74+
clear(@work_start);
75+
clear(@work_process);
76+
clear(@work_done);
77+
}

tools/dtrace/single_up_info.d

+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
/*
2+
* Display internal Upstairs status for the PID provided as $1
3+
*/
4+
#pragma D option quiet
5+
#pragma D option strsize=1k
6+
/*
7+
* Print the header right away
8+
*/
9+
dtrace:::BEGIN
10+
{
11+
show = 21;
12+
}
13+
14+
/*
15+
* Every second, check and see if we have printed enough that it is
16+
* time to print the header again
17+
*/
18+
tick-1s
19+
/show > 20/
20+
{
21+
printf("%8s ", "SESSION");
22+
printf("%17s %17s %17s", "DS STATE 0", "DS STATE 1", "DS STATE 2");
23+
printf(" %5s %5s %9s %5s", "UPW", "DSW", "NEXT_JOB", "BAKPR");
24+
printf(" %10s", "WRITE_BO");
25+
printf(" %5s %5s %5s", "NEW0", "NEW1", "NEW2");
26+
printf(" %5s %5s %5s", "IP0", "IP1", "IP2");
27+
printf(" %5s %5s %5s", "D0", "D1", "D2");
28+
printf(" %5s %5s %5s", "S0", "S1", "S2");
29+
printf(" %5s %5s %5s", "ER0", "ER1", "ER2");
30+
printf(" %5s %5s %5s", "EC0", "EC1", "EC2");
31+
printf("\n");
32+
show = 0;
33+
}
34+
35+
crucible_upstairs*:::up-status
36+
/pid==$1/
37+
{
38+
show = show + 1;
39+
session_id = json(copyinstr(arg1), "ok.session_id");
40+
41+
/*
42+
* I'm not very happy about this, but if we don't print it all on one
43+
* line, then multiple sessions will clobber each others output.
44+
*/
45+
printf("%8s %17s %17s %17s %5s %5s %9s %5s %10s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s\n",
46+
47+
substr(session_id, 0, 8),
48+
49+
/*
50+
* State for the three downstairs
51+
*/
52+
json(copyinstr(arg1), "ok.ds_state[0]"),
53+
json(copyinstr(arg1), "ok.ds_state[1]"),
54+
json(copyinstr(arg1), "ok.ds_state[2]"),
55+
56+
/*
57+
* Work queue counts for Upstairs and Downstairs
58+
*/
59+
json(copyinstr(arg1), "ok.up_count"),
60+
json(copyinstr(arg1), "ok.ds_count"),
61+
62+
/*
63+
* Job ID delta and backpressure
64+
*/
65+
json(copyinstr(arg1), "ok.next_job_id"),
66+
json(copyinstr(arg1), "ok.up_backpressure"),
67+
json(copyinstr(arg1), "ok.write_bytes_out"),
68+
69+
/*
70+
* New jobs on the work list for each downstairs
71+
*/
72+
json(copyinstr(arg1), "ok.ds_io_count.new[0]"),
73+
json(copyinstr(arg1), "ok.ds_io_count.new[1]"),
74+
json(copyinstr(arg1), "ok.ds_io_count.new[2]"),
75+
76+
/*
77+
* In progress jobs on the work list for each downstairs
78+
*/
79+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[0]"),
80+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[1]"),
81+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[2]"),
82+
83+
/*
84+
* Completed (done) jobs on the work list for each downstairs
85+
*/
86+
json(copyinstr(arg1), "ok.ds_io_count.done[0]"),
87+
json(copyinstr(arg1), "ok.ds_io_count.done[1]"),
88+
json(copyinstr(arg1), "ok.ds_io_count.done[2]"),
89+
90+
/*
91+
* Skipped jobs on the work list for each downstairs
92+
*/
93+
json(copyinstr(arg1), "ok.ds_io_count.skipped[0]"),
94+
json(copyinstr(arg1), "ok.ds_io_count.skipped[1]"),
95+
json(copyinstr(arg1), "ok.ds_io_count.skipped[2]"),
96+
97+
/* Extents Repaired */
98+
json(copyinstr(arg1), "ok.ds_extents_repaired[0]"),
99+
json(copyinstr(arg1), "ok.ds_extents_repaired[1]"),
100+
json(copyinstr(arg1), "ok.ds_extents_repaired[2]"),
101+
/* Extents Confirmed */
102+
json(copyinstr(arg1), "ok.ds_extents_confirmed[0]"),
103+
json(copyinstr(arg1), "ok.ds_extents_confirmed[1]"),
104+
json(copyinstr(arg1), "ok.ds_extents_confirmed[2]"));
105+
106+
}

tools/dtrace/sled_upstairs_info.d

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
/*
2+
* Display internal Upstairs status.
3+
* This is an ease of use script that can be run on a sled and will
4+
* output stats for all propolis-server or pantry process (anything
5+
* that has an upstairs). The PID and SESSION will be unique for
6+
* an upstairs. Multiple disks attached to a single propolis server
7+
* will share the PID, but have unique SESSIONs.
8+
*/
9+
#pragma D option quiet
10+
#pragma D option strsize=1k
11+
/*
12+
* Print the header right away
13+
*/
14+
dtrace:::BEGIN
15+
{
16+
show = 21;
17+
}
18+
19+
/*
20+
* Every second, check and see if we have printed enough that it is
21+
* time to print the header again
22+
*/
23+
tick-1s
24+
/show > 20/
25+
{
26+
printf("%5s %8s ", "PID", "SESSION");
27+
printf("%17s %17s %17s", "DS STATE 0", "DS STATE 1", "DS STATE 2");
28+
printf(" %5s %5s %9s %5s", "UPW", "DSW", "NEXT_JOB", "BAKPR");
29+
printf(" %10s", "WRITE_BO");
30+
printf(" %5s %5s %5s", "NEW0", "NEW1", "NEW2");
31+
printf(" %5s %5s %5s", "IP0", "IP1", "IP2");
32+
printf(" %5s %5s %5s", "D0", "D1", "D2");
33+
printf(" %5s %5s %5s", "S0", "S1", "S2");
34+
printf(" %5s %5s %5s", "ER0", "ER1", "ER2");
35+
printf(" %5s %5s %5s", "EC0", "EC1", "EC2");
36+
printf("\n");
37+
show = 0;
38+
}
39+
40+
crucible_upstairs*:::up-status
41+
{
42+
show = show + 1;
43+
session_id = json(copyinstr(arg1), "ok.session_id");
44+
45+
/*
46+
* I'm not very happy about this very long muli-line printf, but if
47+
* we don't print it all on one line, then multiple sessions will
48+
* clobber each others output.
49+
*/
50+
printf("%5d %8s %17s %17s %17s %5s %5s %9s %5s %10s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s\n",
51+
52+
pid,
53+
substr(session_id, 0, 8),
54+
55+
/* State for the three downstairs */
56+
json(copyinstr(arg1), "ok.ds_state[0]"),
57+
json(copyinstr(arg1), "ok.ds_state[1]"),
58+
json(copyinstr(arg1), "ok.ds_state[2]"),
59+
60+
/* Work queue counts for Upstairs and Downstairs */
61+
json(copyinstr(arg1), "ok.up_count"),
62+
json(copyinstr(arg1), "ok.ds_count"),
63+
64+
/* Job ID and backpressure */
65+
json(copyinstr(arg1), "ok.next_job_id"),
66+
json(copyinstr(arg1), "ok.up_backpressure"),
67+
json(copyinstr(arg1), "ok.write_bytes_out"),
68+
69+
/* New jobs on the work list for each downstairs */
70+
json(copyinstr(arg1), "ok.ds_io_count.new[0]"),
71+
json(copyinstr(arg1), "ok.ds_io_count.new[1]"),
72+
json(copyinstr(arg1), "ok.ds_io_count.new[2]"),
73+
74+
/* In progress jobs on the work list for each downstairs */
75+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[0]"),
76+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[1]"),
77+
json(copyinstr(arg1), "ok.ds_io_count.in_progress[2]"),
78+
79+
/* Completed (done) jobs on the work list for each downstairs */
80+
json(copyinstr(arg1), "ok.ds_io_count.done[0]"),
81+
json(copyinstr(arg1), "ok.ds_io_count.done[1]"),
82+
json(copyinstr(arg1), "ok.ds_io_count.done[2]"),
83+
84+
/* Skipped jobs on the work list for each downstairs */
85+
json(copyinstr(arg1), "ok.ds_io_count.skipped[0]"),
86+
json(copyinstr(arg1), "ok.ds_io_count.skipped[1]"),
87+
json(copyinstr(arg1), "ok.ds_io_count.skipped[2]"),
88+
89+
/* Extents Repaired */
90+
json(copyinstr(arg1), "ok.ds_extents_repaired[0]"),
91+
json(copyinstr(arg1), "ok.ds_extents_repaired[1]"),
92+
json(copyinstr(arg1), "ok.ds_extents_repaired[2]"),
93+
94+
/* Extents Confirmed */
95+
json(copyinstr(arg1), "ok.ds_extents_confirmed[0]"),
96+
json(copyinstr(arg1), "ok.ds_extents_confirmed[1]"),
97+
json(copyinstr(arg1), "ok.ds_extents_confirmed[2]"));
98+
}

0 commit comments

Comments
 (0)