Laurenfrost's Blog.

今朝有酒今朝醉,明日愁来明日愁。


Unterstanding Rust’s Vec and its capacity for fast and efficient programs

理解 Rust 中的 Vec,以及性能问题

Contents

Originally posted on November 19, 2016, by Markus Jais.

You want your programs to be fast and memory efficient. Rust’s built-in collections give you both speed and efficiency.

The most important collection is std::vec::Vec (just called Vec below), which is a dynamically growing array similar to std::vector in C++ or ArrayList in Java (but unlike the persistent Vector in Scala).

Vec is part of std::collection and probably the collection you’re are using the most in Rust. Actually, the Rust docs recommend that in most cases you just use Vec if you need a dynamically growing array (list) or a HashMap in case you need a map.

Vec is built on top of alloc::raw_vec::RawVec. This data structure itself is not meant to be used by Rust programmers. It has a lot of unsafe code in it (“unsafe” meaning the Rust language keyword. See the Rust language documantation for details).

The good news is that by using Vec you’ll never have to deal with unsafe code yourself. It has all been taken care of by the Rust core developers and you can use Vec with all the guarantees the Rust compiler is giving you (like memory safety, no data races using threads, etc.).

In order to get the most out of Vec you need to understand what capacity means.

A Vec stores it’s data in a buffer that can grow dynamically when you add elements. You can create a new vector that has an empty buffer or you can allocate a vector which already has a certain buffer size. That is known as the vector’s capacity.

As with everything in life and in programming there are pros and cons. Dynamically growing a vector involves allocating new memory and potentially copying the data to a new memory location. That has costs.

Let’s say you start with a Vec with a capacity for holding 5 integers that are 32 bit wide (i32).

let mut vec = Vec::with_capacity(5);

That allows you to add 5 elements to the Vec. If you add a 6th element the Vec (or to be more specific the internal RawVec) needs to allocate more memory internally.

If you start with a small Vec and add lots of elements it has to reallocate several times. That could result in bad performance. In order to avoid too many reallocations Vec does allocate more than just the space for one element once it notices that a reallocation is necessary. That is because Vec assumes that you’re probably going to add even more elements and adding those will be much cheaper if there is already enough space allocated. Of course if you don’t add any further elements then allocating more memory than what is strictly necessary will wasted memory resources.

That is the trade-off when using a dynamically growing data structure. Rust allows you to tune this in certain ways if you have an idea of the amount of elements you’re going to store.

For example if you know that your Vector will need space for 15-20 elements but not more you can allocate a Vec with an initial capacity of 20 and you’ll never have to allocate new memory.

Sometimes you’ll only know during runtime that you need to store more elements in a Vec. This is were the reserve and reserve_exact methods come in.

Below we will look at a lot of examples that explain in detail how the capacity of the Vec is affected by the different methods.

Empty Vec, reserve, clear and shrink_to_fit

let mut vec: Vec<i32> = Vec::new();  // empty vector has capacity of 0

println!("capacity vec:{}", vec.capacity());

vec.reserve(8); // now capacity of 16 because vector empty and 8 will be multiplied with 2
println!("capacity vec after reserve(8):{}", vec.capacity());

vec.clear();  // clear does not affect capacity
println!("capacity vec after clear:{}", vec.capacity());

vec.shrink_to_fit();  // because vector has no elements capacity not 0 again
println!("capacity vec after clear and shrink_to_fit:{}", vec.capacity());

println!("vec:{:?}", vec);

In this example we create an empty Vec. An empty Vec does not allocate capacity and therefore it is 0. When we reserve space for 8 elements the internal RawVec allocates space for 2x as many elements resulting in a capacity of 16. clear() doesn’t affect the capacity at all, it would just remove any element in the Vec. shrink_to_fit() frees any allocated memory that is not used, in our case the Vec is back at a capacity of 0.

Empty Vec, vec! and pushing elements

let mut vec: Vec<i32> = Vec::new();  // empty
vec.push(100);  // empty vec with capacity 0 will resize to capacity of 4 when one element is added
println!("capacity vec after pushing 1 element to empty vec:{}", vec.capacity());

let mut vec: Vec<i32> = vec![1];  // capacity is 4Vec::new();  // empty
println!("capacity vec after macro initialization with 1 element: {}", vec.capacity());
vec.push(100); // non empty vectors resize capacity to double their currenty capcacity
println!("capacity vec after macro initialization with 1 element and pushing 1 element:{}", vec.capacity());

let mut vec: Vec<i32> = vec![1,2,3,4];  // capacity is 4
println!("capacity vec after macro initialization with 4 elements: {}", vec.capacity());
vec.push(100); // with capacity of 4 and 4 elements adding another element will increase capacity to 8 (2x4)
println!("capacity vec after macro initialization with 4 element and pushing 1 element:{}", vec.capacity());

Here we create an empty Vector with a capacity of 0. When we add 1 element the Vec allocates a capacity of 4. This is a special case for empty Vecs. Normally the capacity would be doubled but for empty Vecs a space of 4 elements is allocated once the first the element is added.

When we create a Vec with vec! and only 1 element in it it has a capacity of 1. After adding a 2nd element the capacity is not increased to 4 (as when adding an element to an empty Vec) but it is just doubled to 2.

Similarly when we create an Vec with vec! and 4 initial elements. Then the capacity is doubled to 8.

Empty Vec, pushing one element, reserve() and shrink_to_fit()

let mut vec: Vec<i32> = Vec::new();  // empty
vec.push(100);  // empty vec with capacity 0 will resize to capacity of 4 when one element is added
println!("capacity vec after pushing 1 element to empty vec:{}", vec.capacity());
vec.reserve(8);  // 1 element in the vector + 8 reserved and then multiplied by 2. (1 + 8) * 2 = 19
println!("capacity vec after pushing 1 element and reserve(8):{}", vec.capacity());

vec.shrink_to_fit(); // now capacity is as big as the number of elements, in this case 1
println!("capacity vec after shrink_to_fit:{}", vec.capacity());

println!("vec:{:?}", vec);  // just to prove that the element is still there after shrinking :-)


println!("\n4) ==========================================\n");

Here we create another empty Vec and push 1 element. This results in a capacity of 4. When reserving 8 more the resulting capacity is 18. Why is that? The Vec adds the number of elements (here 1) and the reserved capacity of 8 and then multiplies the value with 2. In this case this is (1 + 8 ) * 2 = 18. When we call shrink_to_fit() it will reduce the capacity back to 1.

Vec with initial capacity

let mut vec: Vec<i32> = Vec::with_capacity(5);
println!("capacity vec:{}", vec.capacity());

vec.push(1);  // adding one element to capacity with
println!("capacity after adding 1 element to empty vec with capacity 5:{}", vec.capacity());

vec.push(2);
vec.push(3);
println!("capacity after adding 3 elements to empty vec with capacity 5:{}", vec.capacity());

vec.push(4);
vec.push(5); // still no resize because number of elements is <= capacity
println!("capacity after adding 5 elements to empty vec with capacity 5:{}", vec.capacity());

vec.push(6);  // now resizing happens
println!("capacity after adding 6 elements to empty vec with capacity 5:{}", vec.capacity());

If you already know how many elements your Vec will contain (either exactly or even just approximately) you can create a Vec with an an initial capacity. Then the Vec can already allocate all the necessary memory and doesn’t have to reallocate memory once you add elements – unless you add more than the initial capacity. Here we create a Vec with an initial capacity of 5 and add a few elements. As long as we don’t add more than 5 elements, no resizing happens and the capacity stays at 5. If we add a 6th element the capacity doubles to 10.

Vec with initial capacity and a later call to reserve()

let mut vec: Vec<i32> = Vec::with_capacity(5);
vec.reserve(4); // still capacity of 5
println!("capacity of empty vec with initial capacity of 5 after reserve(4):{}", vec.capacity());

vec.reserve(6); // now capacity of 12 (6 * 2)
println!("capacity of empty vec with initial capacity of 5 after reserve(6):{}", vec.capacity());

This is another interesting example. When we create a Vec with an initial capacity of 5 reserving 4 elements doesn’t affect the capacity. When we reserve 6 elements it allocates space for twice that much (6 * 2) resulting in a capacity of 12.

  1. Vec with initial capacity, push() and a later call to reserve()
    let mut vec: Vec<i32> = Vec::with_capacity(5);
    vec.push(1);
    vec.push(2);
    vec.reserve(4); // will result in a capacity of 12. (2 + 4) * 2
    println!("capacity vec with initial capacity of 5 after adding 2 elements and reserve(4):{}", vec.capacity());

    vec.reserve(6); // no changes because capacity is already 12
    println!("capacity vec with initial capacity of 5 after adding 2 and reserve(4) and then reserve(6):{}", vec.capacity());

This is a slightly different version of the one before. Here we push 2 elements to a Vec of 5. When we later reserve space for 4 more elements the Vec resizes to 12. It adds the number of elements already in it and the number of newly reserved elements (2 + 4) and multiplies them with 2. That results in a capacity of 12 ( (2 + 4) * 6 ). Another call to reserve(6) has no effect as the Vec already has a capacity of 12 but only 2 elements in it. If we had called push() 5 more times after the first call to reserve than the 2nd call to reserve(6) would have had resulted in a reallocation. Why? Because there would have been 7 elements in the Vec (2 pushed before the 1st call to reserve() and 5 pushed after that). In that case reserve() would have added the elements in the Vec (7) to the newly reserved capacity (6) and multiplied that with 2. That would have resulted in a capacity (7 + 6) * 2, that is 26. Try it out for yourself to see the effect!.

Vec with initial capacity and reserve_exact()

let mut vec: Vec<i32> = Vec::with_capacity(5);
vec.push(1);
vec.push(2);
vec.reserve_exact(3); // 2 elements added and 3 reserved will keep the initial capacity of 5
println!("capacity vec with initial capacity of 5 after adding 2 elements and reserve_exact(3):{}", vec.capacity());

reserve_exact() is important when you want to reserve extra space but already know how much you’re going to need. This is especially important when you want to reserve a lot of extra memory, let’s say for 500,000 new elements. If you just call reserve() the allocated memory might be twice as big (see examples above) and you’ll end up with memory allocated for 1,000,000 elements. If you only need memory for 500K elements then you would waste a lot of memory. If this is in server code and you have a lot of requests calling your function at the same time, you might run out of memory. Rust immediately frees memory when no longer needed (remember, it has no GC) but if the Vecs stay around for some time (e.g. because your code is waiting in a blocking call to a database) it might cost too much and your program might stop working or crash. In this example reserve_exact(3) has no effect on the Vec‘s capacity as it already has a capacity of 5 and only 2 elements in it.

Vec with initial capacity and reserve_exact(), 2nd example

let mut vec: Vec<i32> = Vec::with_capacity(5);
vec.push(1);
vec.push(2);
vec.reserve_exact(4); // capacity now 6. 2 elements already inside the vector and reserving exactly 4 more
println!("capacity vec with initial capacity of 5 after adding 2 elements and reserve_exact(4):{}", vec.capacity());

Here we have a very similar example to the one above but now we are reserving exactly 4 elements in a Vec that has a capacity of 5 and already 2 elements in it. Because 2+4 does not fit into a capacity of 5 extra space is allocated. In this case the new capacity is 6 (2+4).

I hope all these examples shed some light on the Vec.

You can find all the examples of this article on Github: https://github.com/MarkusJais/rust-collection-examples/blob/master/src/bin/vec_capacity.rs

Other data structures building on Vec Several other data structures build on top of Vec. These include the VecMap (a Map based on Vec), BinaryHeap (a Priority Queue) or VecDeque. I will cover those data structures in more detail in upcoming articles.

A final note about Performance

In many cases tweaking the capacity and reallocation of memory for the Vector might not make a lot of difference in your application. Good architecture (e.g. avoiding needless network calls, batching of requests, etc) might have a much more noticeable impact. That said, in some cases every bit of improved performance is better, e.g. when you’re writing a new NoSQL database, a big data processing tool or other software that needs to deal with a lot of data or a lot of requests or both. If you use Vec a lot in that kind of code, knowing how it works internally is crucial. BTW, Rust is a great language for writing high-performance NoSQL databases or big data processing tools and I expect to see more than one in the future).

The Rust team has put a lot of effort into making Vec work as good as it gets. Don’t try to reinvent the wheel. It is a fantastic data structure. Only come up with something new if you’ve done tests and really, really (I mean really!!!) know that Vec is not good enough for your requirements.

This might change

The code was tested with the stable Rust 1.13 version. The exact algorithms for reserve() are implementation details, so this might change in future versions of Rust.