Skip to content

Commit d900e29

Browse files
authored
Improve slow path performance for allocation (microsoft#143)
* Remote dealloc refactor. * Improve remote dealloc Change remote to count down to 0, so fast path does not need a constant. Use signed value so that branch does not depend on addition. * Inline remote_dealloc The fast path of remote_dealloc is sufficiently compact that it can be inlined. * Improve fast path in Slab::alloc Turn the internal structure into tail calls, to improve fast path. Should be no algorithmic changes. * Refactor initialisation to help fast path. Break lazy initialisation into two functions, so it is easier to codegen fast paths. * Minor tidy to statically sized dealloc. * Refactor semi-slow path for alloc Make the backup path a bit faster. Only algorithmic change is to delay checking for first allocation. Otherwise, should be unchanged. * Test initial operation of a thread The first operation a new thread takes is special. It results in allocating an allocator, and swinging it into the TLS. This makes this a very special path, that is rarely tested. This test generates a lot of threads to cover the first alloc and dealloc operations. * Correctly handle reusing get_noncachable * Fix large alloc stats Large alloc stats aren't necessarily balanced on a thread, this changes to tracking individual pushs and pops, rather than the net effect (with an unsigned value). * Fix TLS init on large alloc path * Add Bump ptrs to allocator Each allocator has a bump ptr for each size class. This is no longer slab local. Slabs that haven't been fully allocated no longer need to be in the DLL for this sizeclass. * Change to a cycle non-empty list This change reduces the branching in the case of finding a new free list. Using a non-empty cyclic list enables branch free add, and a single branch in remove to detect the empty case. * Update differences * Rename first allocation Use needs initialisation as makes more sense for other scenarios. * Use a ptrdiff to help with zero init. * Make GlobalPlaceholder zero init The GlobalPlaceholder allocator is now a zero init block of memory. This removes various issues for when things are initialised. It is made read-only to we detect write to it on some platforms.
1 parent ecef894 commit d900e29

20 files changed

+688
-237
lines changed

difference.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,10 @@ This document outlines the changes that have diverged from
3333
4. We now store a direct pointer to the next element in each slabs free list
3434
rather than a relative offset into the slab. This enables list
3535
calculation on the fast path.
36-
36+
37+
5. There is a single bump-ptr per size class that is part of the
38+
allocator structure. The per size class slab list now only contains slabs
39+
with free list, and not if it only has a bump ptr.
3740

3841
[2-4] Are changes that are directly inspired by
3942
(mimalloc)[http://github.com/microsoft/mimalloc].

src/ds/address.h

+20
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,15 @@ namespace snmalloc
2424
return reinterpret_cast<T*>(reinterpret_cast<char*>(base) + diff);
2525
}
2626

27+
/**
28+
* Perform pointer arithmetic and return the adjusted pointer.
29+
*/
30+
template<typename T>
31+
inline T* pointer_offset_signed(T* base, ptrdiff_t diff)
32+
{
33+
return reinterpret_cast<T*>(reinterpret_cast<char*>(base) + diff);
34+
}
35+
2736
/**
2837
* Cast from a pointer type to an address.
2938
*/
@@ -125,4 +134,15 @@ namespace snmalloc
125134
return static_cast<size_t>(
126135
static_cast<char*>(cursor) - static_cast<char*>(base));
127136
}
137+
138+
/**
139+
* Compute the difference in pointers in units of char. This can be used
140+
* across allocations.
141+
*/
142+
inline ptrdiff_t pointer_diff_signed(void* base, void* cursor)
143+
{
144+
return static_cast<ptrdiff_t>(
145+
static_cast<char*>(cursor) - static_cast<char*>(base));
146+
}
147+
128148
} // namespace snmalloc

src/ds/bits.h

+2-2
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ namespace snmalloc
329329
*
330330
* `std::min` is in `<algorithm>`, so pulls in a lot of unneccessary code
331331
* We write our own to reduce the code that potentially needs reviewing.
332-
**/
332+
*/
333333
template<typename T>
334334
constexpr inline T min(T t1, T t2)
335335
{
@@ -341,7 +341,7 @@ namespace snmalloc
341341
*
342342
* `std::max` is in `<algorithm>`, so pulls in a lot of unneccessary code
343343
* We write our own to reduce the code that potentially needs reviewing.
344-
**/
344+
*/
345345
template<typename T>
346346
constexpr inline T max(T t1, T t2)
347347
{

src/ds/cdllist.h

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
#pragma once
2+
3+
#include "defines.h"
4+
5+
#include <cstdint>
6+
#include <type_traits>
7+
8+
namespace snmalloc
9+
{
10+
/**
11+
* Special class for cyclic doubly linked non-empty linked list
12+
*
13+
* This code assumes there is always one element in the list. The client
14+
* must ensure there is a sentinal element.
15+
*/
16+
class CDLLNode
17+
{
18+
/**
19+
* to_next is used to handle a zero initialised data structure.
20+
* This means that `is_empty` works even when the constructor hasn't
21+
* been run.
22+
*/
23+
ptrdiff_t to_next = 0;
24+
25+
// TODO: CHERI will need a real pointer too
26+
// CDLLNode* next = nullptr;
27+
CDLLNode* prev = nullptr;
28+
29+
void set_next(CDLLNode* c)
30+
{
31+
// TODO: CHERI will need a real pointer too
32+
// next = c;
33+
to_next = pointer_diff_signed(this, c);
34+
}
35+
36+
public:
37+
/**
38+
* Single element cyclic list. This is the empty case.
39+
*/
40+
CDLLNode()
41+
{
42+
set_next(this);
43+
prev = this;
44+
}
45+
46+
SNMALLOC_FAST_PATH bool is_empty()
47+
{
48+
return to_next == 0;
49+
}
50+
51+
/**
52+
* Removes this element from the cyclic list is it part of.
53+
*/
54+
SNMALLOC_FAST_PATH void remove()
55+
{
56+
SNMALLOC_ASSERT(!is_empty());
57+
debug_check();
58+
get_next()->prev = prev;
59+
prev->set_next(get_next());
60+
// As this is no longer in the list, check invariant for
61+
// neighbouring element.
62+
get_next()->debug_check();
63+
64+
#ifndef NDEBUG
65+
set_next(nullptr);
66+
prev = nullptr;
67+
#endif
68+
}
69+
70+
SNMALLOC_FAST_PATH CDLLNode* get_next()
71+
{
72+
// TODO: CHERI will require a real pointer
73+
// return next;
74+
return pointer_offset_signed(this, to_next);
75+
}
76+
77+
SNMALLOC_FAST_PATH CDLLNode* get_prev()
78+
{
79+
return prev;
80+
}
81+
82+
SNMALLOC_FAST_PATH void insert_next(CDLLNode* item)
83+
{
84+
debug_check();
85+
item->set_next(get_next());
86+
get_next()->prev = item;
87+
item->prev = this;
88+
set_next(item);
89+
debug_check();
90+
}
91+
92+
SNMALLOC_FAST_PATH void insert_prev(CDLLNode* item)
93+
{
94+
debug_check();
95+
item->prev = prev;
96+
prev->set_next(item);
97+
item->set_next(this);
98+
prev = item;
99+
debug_check();
100+
}
101+
102+
/**
103+
* Checks the lists invariants
104+
* x->next->prev = x
105+
* for all x in the list.
106+
*/
107+
void debug_check()
108+
{
109+
#ifndef NDEBUG
110+
CDLLNode* item = get_next();
111+
CDLLNode* p = this;
112+
113+
do
114+
{
115+
SNMALLOC_ASSERT(item->prev == p);
116+
p = item;
117+
item = item->get_next();
118+
} while (item != this);
119+
#endif
120+
}
121+
};
122+
} // namespace snmalloc

src/ds/dllist.h

+4-4
Original file line numberDiff line numberDiff line change
@@ -94,12 +94,12 @@ namespace snmalloc
9494
return *this;
9595
}
9696

97-
bool is_empty()
97+
SNMALLOC_FAST_PATH bool is_empty()
9898
{
9999
return head == Terminator();
100100
}
101101

102-
T* get_head()
102+
SNMALLOC_FAST_PATH T* get_head()
103103
{
104104
return head;
105105
}
@@ -109,7 +109,7 @@ namespace snmalloc
109109
return tail;
110110
}
111111

112-
T* pop()
112+
SNMALLOC_FAST_PATH T* pop()
113113
{
114114
T* item = head;
115115

@@ -169,7 +169,7 @@ namespace snmalloc
169169
#endif
170170
}
171171

172-
void remove(T* item)
172+
SNMALLOC_FAST_PATH void remove(T* item)
173173
{
174174
#ifndef NDEBUG
175175
debug_check_contains(item);

src/ds/helpers.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ namespace snmalloc
4848
*
4949
* Wraps on read. This allows code to trust the value is in range, even when
5050
* there is a memory corruption.
51-
**/
51+
*/
5252
template<size_t length, typename T>
5353
class Mod
5454
{

0 commit comments

Comments
 (0)