Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a clear_with method to each data structure. #95

Closed
wants to merge 3 commits into from

Conversation

novacrazy
Copy link

@novacrazy novacrazy commented Nov 27, 2024

I have a library that uses both the intrusive RBTree and the intrusive linked list, using UnsafeRef to share the node between them. However, because UnsafeRef doesn't implement dropping the pointer on its own, I currently use fast_clear on the tree and then remove the front item of the list in a loop, using UnsafeRef::into_box to drop it, then continue until no elements remain.

This loop of calling remove is sub-optimal because it hooks up the list nodes again despite only iterating forward to drop every node.

It's also infeasible/impossible to write my own UnsafeRef due to the heavy reliance on DefaultPointerOps.

Instead, I propose a clear_with method to provide a function to drop the pointers with custom behavior, with the regular clear being implemented as simply clear_with(drop). Codegen is identical in a quick test. This provides the flexibility I need while remaining fast and safe.

So now my own clear method looks like this:

pub fn clear(&mut self) {
    self.tree.fast_clear();

    self.list.clear_with(|node| {
        // SAFETY: Node is removed from both the tree and list
        let _ = unsafe { UnsafeRef::into_box(node) };
    });
}

I also fixed a handful of typos my IDE noticed.

@novacrazy
Copy link
Author

After vs Before this PR assembly for clearing my structure. Pretty handy.

dfgdfhdsfhdfh

@Amanieu
Copy link
Owner

Amanieu commented Nov 28, 2024

Can you achieve the same thing with .take().into_iter() and then manually freeing each pointer using the iterator?

@novacrazy
Copy link
Author

That's a little bit better but the IntoIter still calls pop_front which calls remove, which results in the following assembly:

test_clear_into_iter:
        push rsi
        sub rsp, 32
        mov qword ptr [rcx + 16], 0
        mov rax, rcx
        vxorps xmm0, xmm0, xmm0
        mov rcx, qword ptr [rcx]
        vmovups xmmword ptr [rax], xmm0
        test rcx, rcx
        jne .LBB4_1
.LBB4_6:
        add rsp, 32
        pop rsi
        ret
.LBB4_5:
        mov qword ptr [rcx], 1
        add rcx, -16
        mov edx, 64
        mov r8d, 8
        call __rust_dealloc
        mov rcx, rsi
        test rsi, rsi
        je .LBB4_6
.LBB4_1:
        mov rsi, qword ptr [rcx]
        mov rax, qword ptr [rcx + 8]
        test rsi, rsi
        je .LBB4_3
        mov qword ptr [rsi + 8], rax
.LBB4_3:
        test rax, rax
        je .LBB4_5
        mov qword ptr [rax], rsi
        jmp .LBB4_5

versus the single loop produced by clear_with:

test_clear_with:
        push rsi
        push rdi
        sub rsp, 40
        mov qword ptr [rcx + 16], 0
        vxorps xmm0, xmm0, xmm0
        mov r8, qword ptr [rcx]
        vmovups xmmword ptr [rcx], xmm0
        test r8, r8
        je .LBB2_3
        mov rsi, qword ptr [rip + __imp_HeapFree]
.LBB2_2:
        mov rdi, qword ptr [r8]
        mov qword ptr [r8], 1
        add r8, -16
        xor edx, edx
        mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0]
        call rsi
        mov r8, rdi
        test rdi, rdi
        jne .LBB2_2
.LBB2_3:
        add rsp, 40
        pop rdi
        pop rsi
        ret

also it's curious why clear_with chooses to call HeapFree while the iter version calls __rust_dealloc

@Amanieu
Copy link
Owner

Amanieu commented Nov 28, 2024

I think it would be better to optimize into_iter to not update links when iterating.

@novacrazy
Copy link
Author

I just tried to replace pop_front in the iterator with code similar to that in clear, but it still seems to add some extra branches as part of the iterator usage:

pub unsafe fn pop_front_fast(&mut self) -> Option<<A::PointerOps as PointerOps>::Pointer> {
    use link_ops::LinkOps;

    if let Some(current) = self.head {
        let next = self.adapter.link_ops().next(current);

        self.adapter.link_ops_mut().release_link(current);

        let res = Some(
            self.adapter
                .pointer_ops()
                .from_raw(self.adapter.get_value(current)),
        );

        self.head = next;

        res
    } else {
        None
    }
}

with that used in IntoIter::next instead of pop_front

New assembly
test_clear:
	push rsi
	push rdi
	push rbx
	sub rsp, 32
	mov qword ptr [rcx + 16], 0
	vxorps xmm0, xmm0, xmm0
	mov rsi, qword ptr [rcx]
	mov r8, qword ptr [rcx + 8]
	vmovups xmmword ptr [rcx], xmm0
	test r8, r8
	je .LBB2_3
	mov rdi, qword ptr [rip + __imp_HeapFree]
.LBB2_2:
	mov rbx, qword ptr [r8 + 8]
	mov qword ptr [r8], 1
	add r8, -16
	xor edx, edx
	mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0]
	call rdi
	mov r8, rbx
	test rbx, rbx
	jne .LBB2_2
.LBB2_3:
	test rsi, rsi
	je .LBB2_5
.LBB2_4:
	mov rax, qword ptr [rsi]
	mov qword ptr [rsi], 1
	mov rsi, rax
	test rax, rax
	jne .LBB2_4
.LBB2_5:
	add rsp, 32
	pop rbx
	pop rdi
	pop rsi
	ret

@novacrazy
Copy link
Author

I could probably implement fold on IntoIter Iterator with the same code as clear_with, so at least for_each would work better.

@novacrazy
Copy link
Author

Oh, I made a mistake and used a pop_back_fast version in next() of the iterator instead. using pop_front_fast actually results in this asm:

test_clear:
	push rsi
	push rdi
	sub rsp, 40
	mov qword ptr [rcx + 16], 0
	vxorps xmm0, xmm0, xmm0
	mov r8, qword ptr [rcx]
	vmovups xmmword ptr [rcx], xmm0
	test r8, r8
	je .LBB2_3
	mov rsi, qword ptr [rip + __imp_HeapFree]
.LBB2_2:
	mov rdi, qword ptr [r8]
	mov qword ptr [r8], 1
	add r8, -16
	xor edx, edx
	mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0]
	call rsi
	mov r8, rdi
	test rdi, rdi
	jne .LBB2_2
.LBB2_3:
	add rsp, 40
	pop rdi
	pop rsi
	ret

which is ideal.

My mistake.

I'll clean this up and push changes instead of using clear_with

@novacrazy
Copy link
Author

I have no idea how to implement fold on the RBTree IntoIter, so the specialization of fold may only be for the lists.

@novacrazy
Copy link
Author

Honestly after experimenting with the iterator optimization today, I still think a clear_with is easier and cleaner. We can more easily document how panicking can lead to memory leaks, for example.

I can try again eventually, but I may not have time to explore alternatives myself more.

@Amanieu
Copy link
Owner

Amanieu commented Nov 29, 2024

I have no idea how to implement fold on the RBTree IntoIter, so the specialization of fold may only be for the lists.

This could be implemented using recursion (visit left subtree, then self, then right subtree), but it's not obvious that this would be faster in practice.

Honestly after experimenting with the iterator optimization today, I still think a clear_with is easier and cleaner. We can more easily document how panicking can lead to memory leaks, for example.

I don't think this justifies adding a whole new API. I would much prefer just making iterators better.

@novacrazy
Copy link
Author

Then for now I'm just going to get away with using:

fn clear_list<K, V>(list: &mut LinkedList<NodeListAdapter<K, V>>) {
    let mut front = list.front();

    while let Some(ptr) = front.clone_pointer() {
        front.move_next();

        let _ = unsafe { UnsafeRef::into_box(ptr) };
    }

    // ensure LinkedList::drop does nothing
    list.fast_clear();
}

for my personal use.

@novacrazy novacrazy closed this Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants