Magma3D Dev-Log 2

2024-11-04

In the last post, I promised a more detailed look at how I achieved multithreading in my ECS. As usual, things weren't quite as easy as I thought. Because of this, I also haven't gotten around to do much work on the renderer. But here you go, a deep dive into all the changes in magma_ecs.

The First Iteration

When I initially implemented multithreading, I implemented a pretty naive approach. All I essentially did was put the Entities and Resources structs of the World into an RwLock. I then provided a method to get either a write or read lock for them. This would (I thought) be an easy and user-friendly API, but I soon realized the many flaws of my solution.

(I also switched a lot of internal iterators to rayon's parallel iterators.)

1. A lot of performance is left on the table:

If operating on data inside the Entities struct requires you to lock the whole thing, you might as well do all the operations sequentially. The only really gain in performance would be operations that don't require that data.

2. The API is very confusing and counter-intuitive:

Above I mentioned operations that don't require data from the struct would benefit from multithreading, but this is only true if you use scopes to drop your locks after doing everything you need; otherwise, the lock is only dropped after the function finishes execution, essentially defeating the purpose of multithreading. This is not intuitive, and you have to know about it, as the compiler won't tell you.
Another thing is the API for getting components.

Instead of doing entity.get_component::<T>().unwrap() you now have to first get the component and then downcast it to the same type you gave to the method again. This is very bad!


// instead of this  
entity.get_component::<T>().unwrap();

// you have to do this (BAD!)  
let lock = entity.get_component::<T>().unwrap();  
lock.downcast_ref::<T>().unwrap();

3. Deadlock potential:

Using this API, it becomes very easy to create deadlocks. This means you have to know how RwLock works under the hood. This fails to achieve the goal of automatically running user code in parallel.

All these things led me to essentially rewrite the whole thing!

New API for Operating on Locked Data

After contemplating solutions for all the problems I had found for some time, I came up with an idea that would seemingly eliminate all of them: using closures.
I quickly added an issue, describing my idea. Initially I was just going to modify world.entities_write(), so instead of getting a lock and then doing something with it, you would supply a closure with all the things you wanted to do, eliminating the need for scopes (and reducing the chance of creating accidental deadlocks).

Implementing this, I thought about the other flaws mentioned above, which got me to rethink the implementation again. So I came up with the following proposal:


world.query.with_component::<Component>().unwrap().run(|query_entities| {  
    // operate on entites  
});

This, I would soon realize, forced me to redo a whole lot of other things, which I then also implemented using closures.
Here is an example for using some of the new API:


use magma_ecs::World;

let mut world = World::new();  
world.register_component::<i32>().unwrap();  
world.create_entity().with_component(10_i32).unwrap();

world.query()  
    .with_component::<i32>()  
    .unwrap()  
    .run(|entities| {  
        for entity in entities {  
            entity  
                .component_mut(|component: &mut i32| {  
                    *component += 2;  
                })
                .unwrap();  
        }
    });

We first create a World and within it an entity with an i32 component of value 10. Then we query the world for entities with i32 components and increase their value by 2. This in itself is not multithreaded, as it is not done using systems, which also have a new API now.

I achieved this by moving the RwLocks from this:


// lib.rs

// ...

#[derive(Default, Debug)]  
pub struct World {  
    resources: RwLock<Resources>,  
    entities: RwLock<Entities>,
}

// ...

into the Resources and Entities structs, like this:


// entities.rs

//...

pub(crate) type Component = Arc<RwLock<dyn Any + Send + Sync>>;  
pub(crate) type ComponentMap = HashMap<TypeId, RwLock<Vec<Option<Component>>>>;

#[derive(Default, Debug)]  
pub struct Entities {  
    components: ComponentMap,  
    // ...
}

// ...

Notice the RwLocks in the Component and ComponentMap types.


// resources.rs

// ...

#[derive(Default, Debug)]  
pub struct Resources {  
    data: RwLock<HashMap<TypeId, Arc<RwLock<dyn Any + Send + Sync>>>>,  
}

// ...

I later implemented an API similar to the Entities' one for resources (see issue 20).

Systems and Dispatcher

Another thing I wanted to address was systems depending on other systems. If, for example, one system spawns an entity and another system modifies a specific component of all entities, ideally the second system would wait for the first system to finish so it can operate on all the entities.
This is not possible when all the systems execute in parallel for obvious reasons.

To give systems the ability to depend on each other, I came up with two things: the Systems and Dispatcher structs.


// systems.rs

// ...

#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]  
pub(crate) struct System {  
    pub run: fn(&World),  
    pub name: &'static str,  
    pub deps: &'static [&'static str],  
}

// ...

#[derive(Default, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]  
pub struct Systems(pub(crate) Vec<System>);

// ...

The Systems struct holds a Vec<System>. Each System has a name, its dependencies, and a pointer to its function.


// dispatcher.rs

// ...

#[derive(Default, Debug, Clone)]  
pub struct Dispatcher(Vec<Vec<fn(&World)>>);

// ...

From Systems, a Dispatcher can be built. The dispatcher holds only the function pointers, but in a 2-dimensional Vec. This Vec acts like stages, where all the systems in one stage run in parallel, but the stages themselves run sequentially. We can create a Dispatcher from Systems like this:

1. Iterate through all systems and put those whose dependencies are already in the dispatcher into a new stage. The first stage are all the systems without dependencies.
2. Add the stage to the dispatcher.
3. Repeat until all systems are in the dispatcher.

I will still be coming back to this in the future to improve some things.

The final change

The last thing on my list now had to do with bitmaps.
In my last post, I didn't really go into the inner workings of my ECS. I won't do this now either, but I will talk about a performance optimization I did.

I hid this until now, but the whole Entities struct actually looked like this until recently:


#[derive(Debug, Default)]  
pub struct Entities {  
    components: ComponentMap,  
    bit_masks: HashMap<TypeId, u128>,  
    map: RwLock<Vec<u128>>,  
    into_index: RwLock<usize>,  
}

The way it works is like so: For every type of component, a bit mask is created in the form of an u128 integer and stored in the bit_masks HashMap along with its TypeId. For entities, map stores which components they have by adding their bit masks together. Using this, we can very quickly check and modify entities by just modifying their bitmaps. E.g., if we want to remove a component from an entity, we just set the corresponding bit in the bit map to 0, never actually touching the component's data.

The only limitation this comes with is a limit of 128 different component types because we only have 128 different bits to represent them. Which is what I wanted to address.

I went with an easy solution of using an existing library that implements bitmaps. My crate of choice was roaring, which seemed to provide the most performance while also using compression and having no size limit for bit maps (technically u32 integer limit).

As you can see, not a lot has changed except the RoaringBitmap stuff, as well as some of the implementation details, which I don't show here.


#[derive(Debug, Default)]  
pub struct Entities {  
    components: ComponentMap,  
    bit_masks: HashMap<TypeId, u32>,  
    map: RwLock<Vec<RoaringBitmap>>,  
    into_index: RwLock<usize>,
}

When rust can't do compile-time checks...

...I instantly write bugs.

At this point, I thought I was done (with this version at least). But as you might have guessed, this wasn't quite the case. While writing tests for all the new functionality, I discovered a new bug.

I wrote a test for the new Systems. Which had two systems execute in parallel, that looked like this:


fn system_1(world: &World) {  
    world.create_entity().with_component(1_u32).unwrap();  
}

fn system_2(world: &World) {  
    world.create_entity().with_component(2_u32).unwrap();  
}

At first this executed just fine, but on consecutive runs it would sometimes just hang and don't complete. I immediately thought of how RwLocks can deadlock when one thread acquires a write lock in between two read locks from another thread.

From the Rust std documentation on RwLock:


// Thread 1              |  // Thread 2  
let _rg1 = lock.read();  |  
                         |  // will block  
                         |  let _wg = lock.write();  
// may deadlock          |  
let _rg2 = lock.read();  |

I started searching the create_entity() method:


// entities.rs

// ...

impl Entities {  
    // ...  

    pub(crate) fn create_entity(&self) -> &Self {  
        {  
            let mut map = self.map.write().unwrap();  
            if let Some((index, _)) = map  
                .par_iter()  
                .enumerate()  
                .find_any(|(_, mask)| mask.is_empty())  
            {  
                *self.into_index.write().unwrap() = index;  
            } else {  
                self.components  
                    .par_iter()  
                    .for_each(|(_key, components)| components.write().unwrap().push(None));  
                map.push(RoaringBitmap::new());  
                *self.into_index.write().unwrap() = map.len() - 1;  
            }  
        }

        self  
    }

    // ...  
}

// ...

Can you find the bug?
No?
That's because there is none.

Let's work through the function:

1. We get a write lock on our entities bitmap.
2. We check for some index in the bitmaps that is empty.
3. If we find an empty spot, we set self.into_index to that index. Otherwise, we create a new empty spot by pushing None values for all components and an empty bitmap to the bitmaps. Then we set into_index to map.len() - 1.

You might have some questions. For example:
What is self.into_index for?
Why show this when there is no bug?

Both of those questions have to do with the actual bug. But to understand it, we first need a second method of the Entities struct:


// entities.rs

// ...

impl Entities {  
    // ...  
    
    pub fn with_component(&self, data: impl Any + Send + Sync) -> Result<&Self, EntityError> {  
        let type_id = data.type_id();  
        {  
            let index = self.into_index.read().unwrap();  
            if let Some(components) = self.components.get(&type_id) {  
            let mut components = components.write().unwrap();  
            let component = components  
                .get_mut(*index)  
                .ok_or(EntityError::ComponentNotRegistered)?;  
            *component = Some(Arc::new(RwLock::new(data)));

            let bit_mask = self.bit_masks.get(&type_id).unwrap();  
            self.map.write().unwrap()[*index] |= bit_mask;  
            } else {  
                return Err(EntityError::ComponentNotRegistered);  
            }  
        }

        Ok(self)  
    }

    // ...  
}

// ...

You might recall this method, as I also used it in the systems that led to the discovery of the bug. And as you can see, self.into_index appears again. We use it to determine the entity we are adding components to on entity creation. Can you see all the problems this method creates in combination with the first one?
Don't worry, I couldn't either.

So the first thing I did after this was switch to parking_lot's RwLocks. This is because parking_lot has a deadlock_detection feature and also comes with the benefit of better performance than std's solution.

Using deadlock_detection, I got closer to the cause of my problem:


8:      0x55977c075a63 - magma_ecs::entities::Entities::with_component::h15a283d76a106c15  
                                at src/entities.rs:72:13  
9:      0x55977c059d3f - systems::system_2::he05bddf393f97622  
                                at tests/systems.rs:65:5  
10:     0x55977c0c94e6 - magma_ecs::systems::dispatcher::Dispatcher::dispatch::{{closure}}::{{closure}}::h1bfc140e2a7ba516  
                                at src/systems/dispatcher.rs:68:17

This backtrace points to with_components(), which got me on the right track.


8:      0x559f5a53e973 - magma_ecs::entities::Entities::create_entity::{{closure}}::hf5bb51bbccefb4a9  
                                at src/entities.rs:51:48

And this other backtrace points to create_entitiy().

In the end, it all had to do with the into_index field.
Remember the potential deadlock example from the standard library docs? Because create_entitiy() as well as with_component() acquire locks to self.into_index, it creates exactly that scenario when run on multiple threads at the same time. But the maybe even bigger flaw is that, should a thread manage to call create_entity() in between another thread's calls to create_entity() and with_component(), it would screw up the whole entity creation process because the into_index changed.

I decided to write a new implementation for create_entity() that would directly add the components provided to the entity.

Here it is:


// entities.rs

// ...

impl Entities {
	// ...
	
	pub(crate) fn create_entity(&self, components: impl ComponentSet) -> Result<(), EntityError> {
        let mut map = self.map.write();
        let mut result = Ok(());
        if let Some((index, _)) = map
            .par_iter()
            .enumerate()
            .find_any(|(_, mask)| mask.is_empty())
        {
            components.for_components(|type_id, component| {
                if let Some(component_vec) = self.components.get(&type_id) {
                    let mut component_vec = component_vec.write();
                    let component_stored = component_vec.get_mut(index).unwrap();
                    *component_stored = Some(component);

                    let bit_mask = self.bit_masks.get(&type_id).unwrap();
                    map[index].insert(*bit_mask);
                } else {
                    result = Err(EntityError::ComponentNotRegistered);
                };
            });
            result
        } else {
            self.components
                .par_iter()
                .for_each(|(_, components)| components.write().push(None));
            map.push(RoaringBitmap::new());

            let index = map.len() - 1;
            components.for_components(|type_id, component| {
                if let Some(component_vec) = self.components.get(&type_id) {
                    let mut component_vec = component_vec.write();
                    let component_stored = component_vec.get_mut(index).unwrap();
                    *component_stored = Some(component);

                    let bit_mask = self.bit_masks.get(&type_id).unwrap();
                    map[index].insert(*bit_mask);
                } else {
                    result = Err(EntityError::ComponentNotRegistered);
                };
            });
            result
        }
    }
    
    // ...
}

This takes a parameter that implements ComponentSet, which is a trait I defined to be able to add multiple different types of components at the same time.


// component_set.rs

// ...

pub trait ComponentSet {
    fn for_components<R: FnMut(TypeId, Arc<RwLock<dyn Any + Send + Sync>>)>(self, run: R);
}

// ...

I implemented this for tuples up to the length of ten. Maybe I will refactor this into a macro in the future.

This trait also adds the possibility of user-defined ComponentSets. But this is a thing to think about in the future.

Putting everything together

At last, I have provided an example of all the new functionality working together. I didn't go into create_entity_batch() because it is just create_entity() but optimized for creating large amounts of the same entity. Maybe in a future blog post?


// main.rs

use magma_ecs::World;
use magma_ecs::systems::Systems;


fn main() {
	let mut world = World::new();

	// register our component types
	world.register_component::<Position>();
	world.register_component::<Rotation>();

	// add update counter resource
	world.add_resource(UpdateCounter(0)).unwrap();

	// create entities with Position and Rotation components
	world.create_entity_batch((Position(0, 0, 0), Rotation(0, 0, 0)), 100).unwrap();

	// create and run dispatcher
	let systems = Systems::new()
	    .with(spawn_entity, "spawn_entity", &[])
	    .with(move_entities, "move_entities", &["spawn_entity"])
	    .with(delete_when_limit, "delete_when_limit", &["spawn_entity"])
	    .with(update_count, "update_count", &["move_entities", "delete_when_limit"]);
	let dispatcher = systems.build_dispatcher();

	loop {
		dispatcher.dispatch(&world);
	}
}

// our systems
fn move_entities(world: &World) {
	world.query()
		.with_component::<Position>()
		.unwrap()
		.with_component::<Rotation>()
		.unwrap()
		.run(|entities| {
			for entity in entities {
                entity
                    .component_mut(|component: &mut Position| {
                        component.0 += 2;
                    })
                    .unwrap();
            }
		});
}

fn spawn_entity(world: &World) {
	world.create_entity((Position(0, 0, 0), Rotation(0, 0, 0))).unwrap();
}

fn delete_when_limit(world: &World) {
	world.query()
		.with_component::<Position>()
		.unwrap()
		.with_component::<Rotation>()
		.unwrap()
		.run(|entities| {
			if entities.len() > 200 {
				for entity in entities {
					entity.delete();
				}
			}
		})
}

fn update_count(world: &World) {
	world.resource_mut(|count: &mut UpdateCounter| count += 1).unwrap();
}

// Our component types
struct Position(i32, i32, i32);
struct Rotation(i32, i32, i32);

// Our resources
struct UpdateCounter(u32);

See you in the next post :3