-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create test framework and setup initial integration tests #186
Conversation
There are some minor points at the clippy level, but overall I think it's good. |
Hey, I'll need a bit more time to progress on this, there has been some issue with my setup, so will need some time to fix it and get back on this 😓 Will keep updating here. |
Looks good so far! |
Hey I have been trying to integrate the framework and original lifecycle tests, but I am running in some issues. When running tests, the youki process seems to block on creating the container, and in ps -a, there are several defunct youki processes, and several running youki processes. I am trying to uncover what could be causing this, but if any of you have any idea, help would be really appreciated. |
@YJDoc2 How about first checking if you can run the tutorial with the latest version of the current main branch? |
Yes, I did check that with the latest , and I found out all except delete test are running ok, similar to what you encountered in the original PR... Can you once verify this when possible to you (of the main branch)? I am still looking into the new tests, hopefully will figure out what is going on. Thanks :) |
It seems that even when running manually, youki create leaves a youki process running after it has finished creating the container, and the main process has exited. This might be one of the forked/cloned version, but is it intended to be left running in the create command? Shouldn't the create command create the container structure and state file, then exit? |
Here is the result of the tutorial at my local.
$ sudo ./youki create -b tutorial tutorial_container
[DEBUG src/container/init_builder.rs:89] 2021-08-10T21:25:45.631668315+09:00 container directory will be "/run/youki/tutorial_container"
[DEBUG src/container/container.rs:84] 2021-08-10T21:25:45.634070322+09:00 Save container status: Container { state: State { oci_version: "v1.0.2", id: "tutorial_container", status: Creating, pid: None, bundle: "tutorial", annotations: {}, created: None, creator: None, use_systemd: None }, root: "/run/youki/tutorial_container" } in "/run/youki/tutorial_container"
[INFO src/cgroups/common.rs:137] 2021-08-10T21:25:45.636864610+09:00 cgroup manager V2 will be used
[DEBUG src/container/builder_impl.rs:96] 2021-08-10T21:25:45.637578892+09:00 init pid is Pid(7861)
[DEBUG src/rootfs.rs:39] 2021-08-10T21:25:45.637848301+09:00 mount root fs "/home/utam0k/ghq/github.com/utam0k/youki/tutorial/rootfs"
[WARN src/rootfs.rs:54] 2021-08-10T21:25:45.638516055+09:00 A feature of cgroup is unimplemented.
[DEBUG src/capabilities.rs:21] 2021-08-10T21:25:45.638848197+09:00 reset all caps
[DEBUG src/capabilities.rs:28] 2021-08-10T21:25:45.638903740+09:00 dropping bounding capabilities to Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE])
[WARN src/syscall/linux.rs:132] 2021-08-10T21:25:45.638955887+09:00 CAP_CHECKPOINT_RESTORE is not supported.
[WARN src/syscall/linux.rs:132] 2021-08-10T21:25:45.638995798+09:00 CAP_PERFMON is not supported.
[WARN src/syscall/linux.rs:132] 2021-08-10T21:25:45.639007839+09:00 CAP_BPF is not supported.
[DEBUG src/process/parent.rs:78] 2021-08-10T21:25:45.639215542+09:00 [child to parent] sending child ready
[DEBUG src/process/parent.rs:174] 2021-08-10T21:25:45.639243645+09:00 received child ready message
[WARN src/cgroups/v2/manager.rs:106] 2021-08-10T21:25:45.639297271+09:00 Controller rdma is not yet implemented.
[DEBUG src/cgroups/v2/hugetlb.rs:12] 2021-08-10T21:25:45.653961059+09:00 Apply hugetlb cgroup v2 config
[DEBUG src/cgroups/v2/io.rs:17] 2021-08-10T21:25:45.653993236+09:00 Apply io cgrup v2 config
[DEBUG src/cgroups/v2/pids.rs:14] 2021-08-10T21:25:45.654007640+09:00 Apply pids cgroup v2 config
[DEBUG src/container/container.rs:84] 2021-08-10T21:25:45.654027336+09:00 Save container status: Container { state: State { oci_version: "v1.0.2", id: "tutorial_container", status: Created, pid: Some(7861), bundle: "tutorial", annotations: {}, created: Some(2021-08-10T12:25:45.654019732Z), creator: Some(0), use_systemd: Some(false) }, root: "/run/youki/tutorial_container" } in "/run/youki/tutorial_container"
$ sudo ./youki start tutorial_container
[DEBUG src/notify_socket.rs:69] 2021-08-10T21:25:49.381825497+09:00 notify container start
[DEBUG src/notify_socket.rs:74] 2021-08-10T21:25:49.381877180+09:00 notify finished
[DEBUG src/container/container.rs:84] 2021-08-10T21:25:49.381899527+09:00 Save container status: Container { state: State { oci_version: "v1.0.2", id: "tutorial_container", status: Running, pid: Some(7861), bundle: "tutorial", annotations: {}, created: Some(2021-08-10T12:25:45.6540[DEBUG src/notify_socket.rs:44] 2021-08-10T21:25:49.381932066+09:00 received: start container} in "/run/youki/tutorial_container"
$ ps auwx | grep youki
$ sudo ./youki kill tutorial_container 9
Error: tutorial_container could not be killed because it was Stopped
$ sudo ./youki delete tutorial_container
[DEBUG src/commands/delete.rs:23] 2021-08-10T21:26:05.933858612+09:00 start deleting tutorial_container
[DEBUG src/commands/delete.rs:32] 2021-08-10T21:26:05.933922543+09:00 load the container from "/run/youki/tutorial_container"
[DEBUG src/commands/delete.rs:41] 2021-08-10T21:26:05.934157160+09:00 container status: Stopped
[DEBUG src/commands/delete.rs:45] 2021-08-10T21:26:05.934176906+09:00 load spec from "/run/youki/tutorial_container/config.json"
[DEBUG src/commands/delete.rs:47] 2021-08-10T21:26:05.935536074+09:00 spec: Spec { version: "1.0.2-dev", root: Some(Root { path: "/home/utam0k/ghq/github.com/utam0k/youki/tutorial/rootfs", readonly: Some(true) }), mounts: Some([Mount { destination: "/proc", typ: Some("proc"), source: Some("proc"), options: None }, Mount { destination: "/dev", typ: Some("tmpfs"), source: Some("tmpfs"), options: Some(["nosuid", "strictatime", "mode=755", "size=65536k"]) }, Mount { destination: "/dev/pts", typ: Some("devpts"), source: Some("devpts"), options: Some(["nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5"]) }, Mount { destination: "/dev/shm", typ: Some("tmpfs"), source: Some("shm"), options: Some(["nosuid", "noexec", "nodev", "mode=1777", "size=65536k"]) }, Mount { destination: "/dev/mqueue", typ: Some("mqueue"), source: Some("mqueue"), options: Some(["nosuid", "noexec", "nodev"]) }, Mount { destination: "/sys", typ: Some("sysfs"), source: Some("sysfs"), options: Some(["nosuid", "noexec", "nodev", "ro"]) }, Mount { destination: "/sys/fs/cgroup", typ: Some("cgroup"), source: Some("cgroup"), options: Some(["nosuid", "noexec", "nodev", "relatime", "ro"]) }]), process: Some(Process { terminal: Some(false), console_size: None, user: User { uid: 0, gid: 0, additional_gids: None, username: None }, args: Some(["sleep", "1"]), command_line: None, env: Some(["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm"]), cwd: "/", capabilities: Some(LinuxCapabilities { bounding: Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE]), effective: Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE]), inheritable: Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE]), permitted: Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE]), ambient: Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE]) }), rlimits: Some([LinuxRlimit { typ: RlimitNofile, hard: 1024, soft: 1024 }]), no_new_privileges: Some(true), apparmor_profile: None, oom_score_adj: None, selinux_label: None }), hostname: Some("runc"), hooks: None, annotations: None, linux: Some(Linux { uid_mappings: None, gid_mappings: None, sysctl: None, resources: Some(LinuxResources { devices: Some([LinuxDeviceCgroup { allow: false, typ: None, major: None, minor: None, access: Some("rwm") }]), memory: None, cpu: None, pids: None, block_io: None, hugepage_limits: None, network: None, rdma: None, unified: None, disable_oom_killer: false, oom_score_adj: None, freezer: None }), cgroups_path: None, namespaces: Some([LinuxNamespace { typ: Pid, path: None }, LinuxNamespace { typ: Network, path: None }, LinuxNamespace { typ: Ipc, path: None }, LinuxNamespace { typ: Uts, path: None }, LinuxNamespace { typ: Mount, path: None }]), devices: None, seccomp: None, rootfs_propagation: None, masked_paths: Some(["/proc/acpi", "/proc/asound", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/sys/firmware", "/proc/scsi"]), readonly_paths: Some(["/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger"]), mount_label: None, intel_rdt: None, personality: None }), solaris: None, windows: None, vm: None }
[DEBUG src/commands/delete.rs:50] 2021-08-10T21:26:05.935601697+09:00 remove dir "/run/youki/tutorial_container"
[INFO src/cgroups/common.rs:137] 2021-08-10T21:26:05.938330659+09:00 cgroup manager V2 will be used
[DEBUG src/cgroups/v2/manager.rs:145] 2021-08-10T21:26:05.938353257+09:00 remove cgroup "/sys/fs/cgroup/youki/tutorial_container" It is expected behavior that the process for the container remains after create executed. That process will be cued by the start command to actually run the container. Are there any other processes left apart from this one?
|
Hey, so I figured out what was going wrong with my original code, and am almost done fixing it. After that I will refactor it a bit and update this PR, roughly by tomorrow. No there is no other process apart from this that is remaining. I was a bit confused originally about the flow.
Sorry for the delay, but my setup is still having some issues, so the development is a little slow 😓 |
@YJDoc2 Wow! Wonderful! |
Add TestableGroup trait
@utam0k @tsturzl I feel some more refactoring and clean up is needed, but are any other changes required? Thanks. |
@YJDoc2 Cool! Is it possible to test each group in parallel? If so, the output may be difficult. |
Well, but I might be able to think about this later. I think I'm generally good. |
It should be possible to run test groups in parallel, as groups return test results and then they get printed in a separate function, we can collect them and after running all print them. My concern would be if the generate uuid function might have some clash when running parallely. Sorry to ask 😅 , but were you referencing that you're generally good about the parallelizing code, or you're fine with overall state of current PR?
Also, currently 'youki' is hard coded, I will update it later to take cmd params instead, along with some other improvements I think that are needed. |
Yes, I agree with you. I think the minimum units should be grouped and parallelized because of the life test cycle and other factors. I think the uuid collision is probably ok, but what are you concerned about?
Oops, sorry. I meant to say that I think it's OK to move forward with parallel processing as a future issue.
👍
|
Now supports -r (required) to take runtime path and -t (optional) to take selected tests to run
change -r to --root, as runc does not take -r commandline option
Hey, sorry for the delay.
Running all tests
running selected tests from create and all lifecycle tests
Now it follows the path of the runtime given, rather than hardcoding 'youki' I have also tried this with runc, and it passes the lifecycle and create tests 🎉 🎉 🎉 |
@YJDoc2
I think this is fine. I don't actively want to use
|
I need to add a bit readme documentation for the test framework , so that people can understand how different structs are to be used. Similarly I want to add an md file to show a template procedure to add a new test group, so people can understand how to add a test group in actual tests. I also feel like we should add one of the tests such as huge tlb or bulk mem from OCI runtime tools before we merge, and open tickets for porting the rest. I am ready to do that, was just waiting to see if any changes in current stage are required.
Yes I think these should be definitely done, but should be addressed by a separate PR.
|
Very nice. I totally agree.
👍
|
Hey, I was checking Huge TLB tests in OCI runtime tools, and we definitely need to setup utils before we open the tickets for porting tests (in this or different PR). By utils I mean the functions and structs needed for setting up cgroups and rlimits etc. Can you suggest some approach which we can take for this? In runtime tools, there is generator struct, which can be configured to set individual fields of config of bundle, can we take some similar approach or is there some existing way using which we can do this? (Sorry I am not much familiar with the cgroup setup in youki yet 😅 ) |
@YJDoc2 Does this refer to example for the device group |
I think that is exactly what is needed! But given that it has been separated from this repo in main branch, but still is part of the test_framework branch, I think it'll be better if we merge this PR, I make another one where I setup those utils, and hugetlb test as a template test, then open tickets and tracking group for rest of the tests. What do you think? @utam0k @tsturzl I would like both of you to take a final look on this PR if we decide to merge this as it is right now, and defer the util and hugetlb to next PR. |
@YJDoc2 |
Codecov Report
@@ Coverage Diff @@
## main #186 +/- ##
==========================================
+ Coverage 39.85% 40.89% +1.04%
==========================================
Files 15 15
Lines 1405 1445 +40
Branches 385 410 +25
==========================================
+ Hits 560 591 +31
+ Misses 665 649 -16
- Partials 180 205 +25 |
…directories, And run fmt and clippy in only those folders in which PR makes changes
So I updated the workflow as required, and also have added an extra job : On another notes : when I was running clippy locally it showed some of warnings in src/* that should fail the action, but still are not doing that. Those clippy warnings should be probably fixed in some other PR. As far as the one failing test action : in details it shows that one of the tests was sent Hang up signal, and due to it the tests failed, when running locally all tests are passing when run cargo test. I'm not sure why test would fail, as I have not changed anything in main source code. @utam0k |
@YJDoc2
As for clippy, the latest stable version(1.54.0) I have was fine. Is there any way to reproduce this?
Hmm... The main branch doesn't have any ci failed. I ran CI after deleting the changes made in this PR in
|
@utam0k Hey, I updated the PR. It seems I made some mistakes in commenting in main.yaml due to which there were some issues??? In the latest commit, I removed all the comments in my mani.yaml, and it seems to pass all the checks. Interestingly, I updated my toolchain today, and in the updated version, those lints seem to be removed, so it was a version issue from my side, apologies.
Also I am not sure why it might be happening, but even with caching the test workflow seems to recompile all the crates for each step in the job. I checked the logs of the test job as it was running, and it seemed to recompile all dependencies in each step, due to which the test job takes most time , on average about 11 minutes, even though tests finish running within a minute on local. If all else is fine, I think we should merge this PR and open tracking issue related to implementing rest of stuff. including the utils section I was talking about, and then rest of the tests. |
I too have my doubts about this. It could be related to a missing option, or it could be that the settings are not working properly, to begin with. Yes, I would like to investigate this separately as well.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a comment.
I did not change anything except the author section in Cargo.toml, yet the tests seem to fail 😭 😭 😭 Even when running locally some tests sometimes a test seem to fail randomly, but not the one that has failed here : in gtihub action, test_run_hook has failed, in my local test, sometimes test process::channel::tests::test_channel_intermadiate_ready fails. I will try to check why this happens tomorrow. |
Sorry, this probably has nothing to do with this PR. I've created the PR to fix it. #267 |
ref : #56
This is a work in progress PR for Creating test framework as well as implement initial integration tests.
Once this is merged we can create a tracking issue for implementing rest of the integration tests