3D Scenes

Key Points

  • [ ] How do we draw more complex objects than textured quads?
    • [ ] What data do we need to draw 3D models?
  • [ ] What is a scene graph?
  • [ ] What are some benefits and drawbacks of transform hierarchies?
  • [ ] How can scene graph structures help with spatial partitioning?

3D Objects

Meshes

So far we've drawn single triangles (for our framebuffer graphics) and pairs of triangles (for our sprites). 3D games often have objects with three, four, or even five triangles, and game scientists are working to produce games with even more triangles than that! The fundamentals are the same as our little textured quads, though: we give the GPU some vertices with positions and UV coordinates, along with a texture to sample from—bind a vertex buffer, bind an index buffer, bind a descriptor set with a texture and sampler, and we're off to the races.

There are, however, a few important details that differ as models become more complex. First, we presently move objects around by recreating all their vertices every frame. If our object had thousands of vertices this could start to get expensive, especially with many objects in the scene. Second, we might have several occurrences of the same type of object in the scene, and this would mean a lot of wasted space and work. We're going to address these problems by separating out the idea of the mesh (the vertices composing the model) from the specific transform used to position the mesh in the world.

For example, if our 3D object were a cube, its mesh would be made of 12 triangles. We could define one such set of vertices and use it to draw the same cube at various positions, or rotated, or scaled, anywhere in the game world. In fact, we could draw several copies of the cube in different orientations with the exact same vertex buffer (and in fact, the same draw call!) We'll discuss how to do this later on today.

For loading meshes, we'll use the russimp crate, which is a wrapper around the Open Asset Importer library (assimp). It can handle a variety of 3D formats, mushing them into common denominator data-structures. If you use the following in your Cargo.toml you can pull it in without needing any external dependencies (or so we can hope):

[dependencies]
russimp = {version="1.0.2", features=["prebuilt"]}

Once that's in there, run cargo doc to generate documentation inside of your target/doc folder (the docs.rs documentation is currently busted). To load the meshes from a file and print out some stats, we'll do something like this:

use russimp::scene::{PostProcess,Scene};
let scene = Scene::from_file(
    &file_path,
    vec![PostProcess::Triangulate, PostProcess::JoinIdenticalVertices, PostProcess::FlipUVs]
)?;
for mesh in scene.meshes {
    // Vertices, uv_components, and texture_coords are for our vertex buffer
    dbg!(mesh.vertices.len());
    dbg!(mesh.uv_components);
    dbg!(&mesh.texture_coords[0].as_ref().map(|p| p[0]));
    // Faces are where we get our indices from for the index buffer
    dbg!(mesh.faces.len());
    dbg!(&mesh.faces[0]);
}

It may be helpful to figure out a bounding sphere or oriented box for each thing, which we can do by looking at the distance of each mesh point from the mesh's center (just one distance for a sphere, or separate x,y,z distances for boxes). This may be useful for collision later on, but it's also important for culling and other purposes (after all, the fastest way to draw something is not to draw it).

In a real engine, you'd probably want to take in a custom binary file format rather than dealing with russimp directly (you might have a build stage or pipeline that transforms fbx files or whatever into your custom format). You'd have a lot of assumptions baked in like a standard model scale, which axis represents which direction, what kind of vertex data are necessary, and so on. But for now we'll deal with these generic file formats and interpret them on load, like in the following example. Note that many lines indicate assumptions we're making about the contents of the file, so beware! Those might be opportunities to add parameters later on.

pub fn load_mesh(&mut self, path: &std::path::Path, scale:f32) -> Result<MeshRef> {
    // Give each mesh a unique number
    // TODO: look up path and make sure we haven't loaded it already!
    let mid = self.next_mesh;
    self.next_mesh += 1;
    {
        use russimp::scene::{PostProcess,Scene};
        let mut scene = Scene::from_file(
            path.to_str().ok_or(eyre!("Mesh path can't be converted to string: {:?}",path))?,
            // Assuming a lot about how the meshes and UVs are defined here...
            vec![PostProcess::Triangulate, PostProcess::JoinIdenticalVertices, PostProcess::FlipUVs]
        )?;
        // Assume we have one mesh per file
        let mesh = scene.meshes.swap_remove(0);
        let verts = &mesh.vertices;
        // Assume the mesh has one "layer" of UV coordinates
        let uvs = mesh.texture_coords.first().ok_or(eyre!("Mesh has no texture coords: {:?}",path))?.as_ref();
        let uvs = uvs.ok_or(eyre!("Mesh doesn't specify texture coords: {:?}",path))?;
        ensure!(mesh.faces[0].0.len()==3,"Mesh face has too many indices: {:?}",mesh.faces[0]);
        let faces:Vec<u32> = mesh.faces.iter().flat_map(|v| { v.0.iter().copied()}).collect();
        let (vb,vb_fut) = vulkano::buffer::ImmutableBuffer::from_iter(
            // Apply a scaling factor to each component of the position
            verts.iter().zip(uvs.iter()).map(|(pos,uv)| Vertex{position:[pos.x*scale,pos.y*scale,pos.z*scale], uv:[uv.x,uv.y]}),
            vulkano::buffer::BufferUsage::vertex_buffer(),
            self.vulkan_config.queue.clone()
        )?;
        let (ib,ib_fut) = vulkano::buffer::ImmutableBuffer::from_iter(
            faces.into_iter(),
            vulkano::buffer::BufferUsage::index_buffer(),
            self.vulkan_config.queue.clone()
        )?;
        // Next frame should wait on the buffers being uploaded
        let load_fut = vb_fut.join(ib_fut);
        let old_fut = self.vulkan_state.previous_frame_end.take();
        self.vulkan_state.previous_frame_end = match old_fut {
            None => Some(Box::new(load_fut)),
            Some(old_fut) => Some(Box::new(old_fut.join(load_fut))),
        };
        // Remember this mesh and buffers for later
        self.meshes.insert(MeshRef(mid), (Rc::new(mesh), vb, ib));
    }
    Ok(MeshRef(mid))
}

Materials & Textures

We'll discuss materials more when we get into lighting, but for now you can think of a material as a combination of some textures (and their samplers) and a particular expectation around the vertex data and shaders (e.g., that there is one "layer" of UV coordinates, or that vertices also have color data, or…). Given that we can swap out textures by binding different descriptor sets, we have a roughly 1:1 correspondence between material families and Vulkan pipelines. So for now, we might have one pipeline for textured objects and another for UI elements. If we wanted to have both textured objects and vertex-colored objects, we'd need to have two different pipelines.

In general we want to minimize the number of distinct material families to minimize the number of different pipelines we're using—changing the pipeline is expensive. In the same way, we want to minimize the number of members of each material family; using the same set of textures for all the objects we want to draw would be ideal, because then we could reduce the number of bindings we need to manage and reduce the number of draw calls. You can think of it like a game of golf: keeping the number of draw-calls low for our scene lets us save our rendering budget for other things like lighting later on.

Instanced Rendering

Keeping the number of distinct materials low is important, but it's only part of the story for minimizing drawcalls. Very often, we'll have some geometry (another word for mesh) that we'll want to render in many parts of the scene. For example, we might have a bunch of enemies using the same mesh, or a bunch of pickups like coins, or decorative assets like soda cans lying around. These might differ mainly in their position and orientation, but also in data like which "cel" of a UV animation they're currently playing or how far along they are in a skeletal animation, or some other parameter. We can separate out the mesh data from the model transform and from other instance-specific data using instanced rendering.

In instanced rendering, we introduce a new type of buffer: instance buffers. These are used with a more complex version of the draw command. So far when we have issued a draw_indexed command, we told Vulkan to draw a certain number of indices pulled from an index buffer, referring to a vertex buffer. There are two other important parameters of draw_indexed: instance_count and first_instance. draw_indexed will draw the requested range of indices from the index buffer again for each instance; each time around, it will do two things differently:

  1. It will increment the gl_InstanceIndex global variable in the shader.
  2. It will use the next piece of data from each of the bound instance buffers.

An instance buffer carries vertex data just like a vertex buffer, except the same instance data is used for every vertex in each instance. So, if we were to use the cube example from before, we'd want to bind an instance buffer with position data for each cube. Then when we asked Vulkan to draw, it would first draw the 36 vertices of the first cube with the first cube's instance data; then the 36 vertices of the second cube with the second cube's instance data; and so on. We tell Vulkan that we plan to use instance buffers when we create the pipeline:

let pipeline = GraphicsPipeline::start()
    // New!
    .vertex_input_state(BuffersDefinition::new().vertex::<Vertex>().instance::<BatchInstanceData>())
    .vertex_shader(vs.entry_point("main").unwrap(), ())
    .input_assembly_state(InputAssemblyState::new()
                          .topology(vulkano::pipeline::graphics::input_assembly::PrimitiveTopology::TriangleList)
    )
    .viewport_state(ViewportState::viewport_dynamic_scissor_irrelevant())
    .fragment_shader(fs.entry_point("main").unwrap(), ())
    // This is new too, worth doing!
    .rasterization_state(RasterizationState::new()
                         .cull_mode(vulkano::pipeline::graphics::rasterization::CullMode::Back)
                         .front_face(vulkano::pipeline::graphics::rasterization::FrontFace::CounterClockwise)
    )
    // Also new!  More about this later
    .depth_stencil_state(vulkano::pipeline::graphics::depth_stencil::DepthStencilState::simple_depth_test())
    .render_pass(Subpass::from(render_pass, 0).unwrap())
    .build(device.clone())
    .unwrap();

Then, we define the format of our instance data. In this example, each object will define a 4x4 homogeneous transform matrix to position and orient it in the world.

#[repr(C)]
#[derive(Clone,Copy,Zeroable,Default,Pod,Debug,PartialEq)]
struct BatchInstanceData {
    model:[f32;4*4],
}
vulkano::impl_vertex!(BatchInstanceData, model);

Next, our shader will need to use this instance data. Here, we combine the view-projection matrix (constant across all objects) with the model matrix (constant within each instance) to get a transform we can apply to each vertex position. We can compute a projection matrix using ultraviolet via ultraviolet::projection::perspective_vk(vertical_fov, aspect, znear, zfar), where vertical FOV is in radians (90 degrees is typical), aspect is a ratio of width to height, znear is the distance to the camera's near Z plane (use a number different from 0!) and zfar is the distance to the far Z plane. Multiplying that with a view matrix (for camera positioning and movement, e.g. from ultraviolet::Mat4::look_at(eye, at, up)) will give a transformation that converts from world coordinates to homogeneous clip-space coordinates, which is what we want coming out of the vertex shader.

#version 450

layout(set=0, binding=0) uniform BatchData { mat4 viewproj; };
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 uv;
layout(location = 2) in mat4 model;
layout(location = 0) out vec2 out_uv;
void main() {
  gl_Position = viewproj*model*vec4(position.xyz, 1.0);
  out_uv = uv;
}

Finally, our binding and draw call look slightly different:

builder
    .bind_pipeline_graphics(scheme.pipeline.clone())
    .bind_vertex_buffers(0, [self.verts.clone()])
    // new!
    .bind_vertex_buffers(1, [self.instance_data.clone()])
    .bind_index_buffer(self.idxs.clone())
    .bind_descriptor_sets(
        vulkano::pipeline::PipelineBindPoint::Graphics,
        (*scheme.pipeline).layout().clone(),
        0,
        uniforms.clone(),
    )
    .bind_descriptor_sets(
        vulkano::pipeline::PipelineBindPoint::Graphics,
        (*scheme.pipeline).layout().clone(),
        1,
        self.material_pds.clone(),
    )
    // new!
    .draw_indexed(self.idxs.len() as u32, self.instance_data.len() as u32, 0, 0, 0)
    .unwrap();

The last little change we'll want to make is to use a depth buffer to let the depth of objects in the scene determine what covers up what, rather than using the draw ordering. We'll need to change our render pass:

let render_pass = vulkano::single_pass_renderpass!(
    device.clone(),
    attachments: {
        color: {
            load: Clear,
            store: Store,
            format: swapchain.image_format(),
            samples: 1,
        },
        // New!
        depth: {
            load: Clear,
            store: DontCare,
            format: vulkano::format::Format::D32_SFLOAT,
            samples: 1,
        }
    },
    pass: {
        color: [color],
        // New!
        depth_stencil: {depth}
    }
)
.unwrap();

A depth buffer is another attachment that is rendered into. While the fragment shader draws colors into the color attachment, the fragment stage also records the depth (z-value, or distance from the screen) of each fragment into the depth attachment. We want to clear the depth buffer with 1.0 every frame, and we won't read the depth buffer after rendering is done; the important thing is that the data is there for the GPU to read when figuring out which fragments should get rendered to pixels.

Since we need a depth attachment, we also need to tweak the code that sets up our framebuffer to have a depth buffer for each frame buffer image:

fn window_size_dependent_setup(
    device:Arc<Device>,
    images: &[Arc<SwapchainImage<Window>>],
    render_pass: Arc<RenderPass>,
    viewport: &mut Viewport,
) -> Vec<Arc<Framebuffer>> {
    let dimensions = images[0].dimensions().width_height();
    viewport.dimensions = [dimensions[0] as f32, dimensions[1] as f32];
    images
        .iter()
        .map(|image| {
            let view = ImageView::new_default(image.clone()).unwrap();
            let depth_buffer = ImageView::new_default(
                AttachmentImage::with_usage(
                    device.clone(),
                    dimensions,
                    vulkano::format::Format::D32_SFLOAT,
                    ImageUsage {
                        depth_stencil_attachment: true,
                        transient_attachment: true,
                        ..ImageUsage::none()
                    },
                ).unwrap(),
            ).unwrap();

            Framebuffer::new(render_pass.clone(),
                             vulkano::render_pass::FramebufferCreateInfo {
                                 attachments:vec![view,depth_buffer],
                                 ..Default::default()
                             }
            ).unwrap()
        })
        .collect::<Vec<_>>()
}

And with that we're done!

3D Scenes

In the example above, we don't really arrange our objects in any meaningful way; sure, everything with the same model and texture is grouped together, but this isn't really a gameful way to organize things. It sounds a little inconvenient, for example, for a hypothetical donut launcher in our game to store a donut_batch_id for each different type of donut it needs to launch. Moreover, a single game object might change its texture or mesh over time for a variety of reasons, so there isn't necessarily a one-to-one relationship between game objects and rendered instances.

We're going to fix this by separating the job of rendering models from the job of representing a game scene. To get there, we'll put the Vulkan code into one or more Renderer structs (for example, a SpriteRenderer, a TexturedMeshRenderer, etc) which combine what we've previously called Scheme and Data. Our game engine will then need to communicate what should be drawn to the renderer every frame; for example, it could compute the inputs to each renderer fresh every frame based on what the in-game camera is looking at, or it could update render data whenever gameplay data changes.

self.mesh_renderer.clear_frame();
self.sprite_renderer.clear_frame();

for obj in self.objects.iter() {
    if let Some(model_data) = obj.get_model() {
        self.mesh_renderer.add_renderable_at(model_data, obj.transform());
    }
    if let Some(sprite_data) = obj.get_sprite() {
        self.sprite_renderer.add_renderable_at(sprite_data, obj.transform());
    }
    // ...
}
self.mesh_renderer.draw(self.vulkan);
self.sprite_renderer.draw(self.vulkan);

This approach can work well if expensive state (which models and meshes are being used, vertex buffer allocations, etc) are preserved from frame to frame, and it makes rendering interpolated game states relatively easy. You might not have methods like get_model and get_sprite, but instead use something like obj.render_meshes(&mut self.mesh_renderer) and obj.render_sprites(&mut self.sprite_renderer).

How can we still have batching if we're just shuffling (mesh, texture) pairs into the mesh renderer willy-nilly? We'll need to use a HashMap or something to do this grouping-together of batches. Then we can look up the model's key (the mesh and the texture) and use the existing batch data to draw the new model (or create new buffers if need be). This dynamic batching is good if you don't have huge numbers of things with the same texture and mesh; if you do, combining static and dynamic batching may be appropriate (for example, by giving TexturedMeshRenderer a method to add a batch all at once).

What is a game object anyway?

What are "things" in the game anyhow? This, like everything else, depends on your engine! You have a spectrum of options here:

  1. Make your engine specific to a certain type of game with e.g. world geometry, one or more player objects, ball objects, enemy objects, and other specific types.
  2. Make your engine as generic as possible, supporting arbitrary data associated with individual objects or groups of objects.

For (1), your engine can define the necessary types and expose them. This can be appropriate e.g. for an RPG-maker engine or an engine for making Super Mario Bros. clones. Even in these cases, though, you might want to expose some extra_data field on each object or at the level of the game engine itself, and this can be done simply (and somewhat slowly) with a HashMap<DataKey,f32> or similar (imagine that DataKey is an enum or struct on which the engine is parameterized).

If we want to store arbitrary data so that any kind of game can be made, we need to get a little fancy. Going up the scale of fanciness, we could start by giving every game object a data field of some type D (that won't be allowed to contain references):

struct Engine<D> {
    //...
    objects: Vec<GameObject<D>>,
    //...
}
struct GameObject<D> {
    transform:Isometry3,
    data:D
}

An example of providing D might be:

enum GameThing {
    Generic,
    Player(/*PlayerState*/),
    Enemy(/*EnemyState*/)
}

// ...
let mut engine: engine::Engine<GameThing> = engine::Engine::new(...);
// ...
engine.create_game_object(
    Some(&model),
    Isometry3::new(
        Vec3::new(0.0, -12.5, 25.0),
        Rotor3::identity()
    ),
    GameThing::Generic
);
// ...
engine.play(|engine| {
    for obj in engine.objects_mut() {
        obj.move_by(Vec3::new(1.0, 1.0, 0.0) * DT);
        dbg!(obj.data_mut());
    }
})

This is okay, but it means every object has the same size in memory. So if you have an object with lots of state (like the player) it will have the same footprint as an object with no state (like an environment decoration). So it may be worthwhile to do something like this instead:

let mut engine:engine::Engine<Box<GameThing>> = engine::Engine::new(engine::WindowSettings::default());

Here, you're trading off the cost of dereferencing the Box on every access for the benefit of large objects not taking up more space in the objects vector than other objects. Following this line of thought, we could let every object have its own associated data of any reasonable type, and at runtime downcast to some specific type:

pub trait GameThing:'static {}
struct Engine {
    //...
    objects: Vec<GameObject>,
    //...
}
struct GameObject {
    transform:Isometry3,
    data:Box<dyn GameThing>
}
impl GameObject {
    pub fn data_mut<T:GameThing>(&mut self) -> Option<&mut T> {
        use std::any::Any;
        let dat = &mut self.data as &mut dyn Any;
        dat.downcast_mut::<T>()
    }
}
// in the game...
#[derive(Debug)]
struct GenericGameThing {}
impl engine::GameThing for GenericGameThing {}

engine.play(|engine| {
    for obj in engine.objects_mut() {
        obj.move_by(Vec3::new(1.0, 1.0, 0.0) * DT);
        dbg!(obj.data_mut::<GenericGameThing>());
    }
})

Now, the play function is starting to look a little suspect. Often, we want to localize behavior in objects rather than having global control over everything all at once. So we might extend the GameThing trait with an update function:

pub trait GameThing:'static {
    fn update(&mut self, trf:&mut Isometry3);
}

So far so good, but what if one object has to make queries involving input or other objects? In a language like C# or Java we'd want to do something like this:

pub trait GameThing:'static {
    fn update(&mut self, trf:&mut Isometry3, eng:&mut Engine);
}

And we'd change play like so, replacing the f function that updates the game world:

//f(&mut self);
for obj in self.objects.iter_mut() {
    obj.data.update(&mut obj.transform, &mut self);
}

Uh-oh:

error[E0499]: cannot borrow `self` as mutable more than once at a time –> scene3d/src/engine.rs:173:65

172 | for obj in self.objects.iter_mut() {

-----------------------
 
first mutable borrow occurs here
first borrow later used here

173 | obj.data.update(&mut obj.transform, &mut self);

^^^^^^^^^ second mutable borrow occurs here

Welp. That's a classic two-mutable-borrows situation, since self.objects.iter_mut() borrows through self, and we need to pass &mut self to the update function. We can split the borrow if we only need to pass the inputs through…

pub trait GameThing:'static {
    fn update(&mut self, trf:&mut Isometry3, inp:&Input);
}
// in play...
let input = &self.input;
for obj in self.objects.iter_mut() {
    obj.data.update(&mut obj.transform, input);
}

And this works great! But now one object's update can't touch or even read from other objects. We can try to split the borrow again to deal with that:

pub trait GameThing:'static {
    fn update(&mut self, trf:&mut Isometry3, inp:&Input, others:&mut [engine::GameObject]);
}
// in play...
let input = &self.input;
for obj in self.objects.iter_mut() {
    obj.data.update(&mut obj.transform, input, &mut self.objects);
}

But this is clearly also a two-mutable-borrows situation. It's not iter_mut()'s fault either; if we iterated through indices, we'd still need to call self.objects[idx].update(..., &mut self.objects), so we'd have two live mutable borrows of self.objects. Usually, if we need to provide both a slice and mutable access to an element of that slice, we'd do something like passing the mutable slice and a usize index into that slice, but we can't do that here since we need obj to call update. We can, however, temporarily remove each obj from self.objects as we process the slice, call update with the rest of the objects, and then put it back when we're done. This sounds gnarly (and we'd probably need to use an index to iterate rather than iterators), and while it would work we'd waste a lot of time shuffling data around.

Rust really doesn't want us to pass around a giant bunch of objects with complex containment relationships. Why is this not a problem in other programming languages? In C++, it actually can be a problem, but it may just exhibit as weird behaviors—it's easy to imagine a bug where some object's update accidentally processes itself (e.g., a missile checks that it's colliding against itself and destroys itself). In Java or C#, the problem shows up just the same except through another layer of indirection: Java and C# references are like a Rust Arc<RefCell<T>>. So we could wrap every object in a RefCell and pay the cost of indirection to access it, but this is prohibitive:

struct Engine {
    //...
    objects: Vec<Rc<RefCell<GameObject>>>,
    //...
}
// in play... notice that we need to explicitly split the borrow!

for obj in self.objects.iter() {
    let (mut trf,mut dat) = RefMut::map_split(obj.borrow_mut(),
                                              |obj| (&mut obj.transform,&mut obj.data)
    );
    dat.update(&mut *trf, inp, &self.objects);
}

This way in update we can borrow_mut() any object we need from objects, and at runtime the RefCell will make sure it doesn't have any extant borrows. Again, why is this not a problem in Java or C#? Actually, it is! In C# we have the distinction between structs and classes (structs are by-value, classes are by-reference), and in Java we just eat the cost of chasing pointers everywhere all the time (this is the primitive vs object distinction).

So let's think of another approach. If objects in general don't have much data associated with them, we could have update take &self and &others and return a new (Isometry3, GameThing); then we'd lose the ability for objects to modify other objects in update, but we'd fix the borrowing issue (at the expense of having two copies of game state, but we might actually want that anyway). So this isn't great either. How can we balance generic enough and fast enough?

Well, the issues here are that (1) individual game objects have data and also (2) when updating they must access all the data of other game objects---and that we insist that (3) individual objects define their own behavior. If we relax any of these constraints, the problem is way easier. We already saw that a global update function (3) avoids the issue. If we disallow modifying other objects (2), we also avoid the issue, but then we need another mechanism for objects to affect each other (for example, adding to a queue of "update actions"). If we don't associate data with individual game objects (1), we also can avoid the mutable borrow by passing around whole groups of objects at once; if those objects live in different memory, there's no chance of aliased mutable borrows. Entity-component systems change up these assumptions drastically; simple examples like this tiny ECS allow for arbitrary "components" to be associated with each object while "systems" process all the data of each component (or groups of components) in bulk; Rust has a variety of ECS systems to choose from, including specs, bevy, legion, and more. There's also a great talk on the subject of ECS designs vs others in Rust.

Just Vecs of Things

Having one giant Vec of game objects is fine. It means everything is positioned in world space (easy to understand, but sometimes difficult to make things move in sync) and it's simple. You probably still want to separate static objects like terrain or environment geometry from the rest of the objects, but even if you don't this can work. For games where everything is visible most of the time, this is completely adequate. In larger worlds, though, most of the game objects will be invisible (they'll be in other rooms, or too far away, or behind the camera)—and while it may or may not make sense to update objects that are far away, it definitely does not make sense to render objects that are far away. Still, we'd need to march through the entire vec to check to see if something should be drawn or not, and this can get expensive.

Better, maybe, to have a vec of currently-important (visible) things and a vec of less-important (and less-visible) things, and shuffle stuff back and forth between those. This doesn't solve the issue completely—you still have to look at every object every frame to maybe move it from one category to another—but it can help. Eventually you might find you want multiple "tiers" of importance, and at that point it's worth investigating a fancier representation.

Scene Graphs: A Tree of Things

The most common representation used in commercial game engines is probably the scene graph. In a scene graph, objects are arranged in a tree rooted at the scene, and objects inherit the transforms of their parents. This means, for example, that you can position the wheels of a car relative to its chassis and as the car moves its parts will all move together. It also means that we can relatively cheaply compute bounding boxes for objects and their (transitive) children, to quickly check whether a part of the scene is in view (or possibly colliding with another object). Scene graphs can offer a good compromise between ease of authoring and efficiency.

First, some theory. When we have a tree (or hierarchy) of transforms, we can't avoid thinking about conversions between frames of reference (or coordinate spaces). The scene's root is the base coordinate system or world space. Its children are positioned and oriented somewhere in that space, but their children are positioned relative to their parents. For example, if the scene contains a car at (5.0, 0.0, 0.0) with no rotation, and the car has a hood ornament in its forward-z direction at (0.0, 1.0, 2.0), then in world coordinates the hood ornament is at (5.0, 1.0, 2.0) (which we obtain by concatenating the car transform and the hood transform: \(T_{car} * T_{hood} * (0,0,0)\) will give us the position of the hood ornament in world coordinates). We saw this already in rendering: proj * view * model gives us a vertex-space-to-clip-space transform.

If we want to know whether the hood ornament is colliding with a soccer ball (whose position is in world coordinates), we need to first convert both objects' positions and sizes into some common frame of reference. Usually, we would pick one object (A) as the frame of reference and convert the other's (B) transform by first converting from B space to world space (composing the transforms of B) and then from world space down to A space by using A's inverse transform. To convert a vector \(v_{local}\) in frame Z (local) to frame A (global), you'll want a transform like \(A * B * C * \ldots * Z * v_{local}\). To convert the other way (global to local, Z to A) you want to invert each transformation along the way (from \(T\) to \(T^{-1}}\)): \(Z^{-1}*Y^{-1}*\ldots*A^{-1} * v_{global}\).

The web also has plenty of useful resources on transformations, coordinate spaces, and transform hierarchies.

It may make sense to compute these transforms as needed and cache them during the current frame, or come up with some other mechanism to minimize the number of transforms you need to compute. At a minimum, everything that needs to be rendered will need its model matrix computed (that's the \(A * B * C * \ldots * Z\) one) for rendering!

Containment vs Constraints

We might realize our scene graph like so:

struct GameObject {
    children:Vec<GameObject>,
    local_transform:Transform, /* e.g., Isometry3 */
    parent_frame:Transform,    // identity for the root
    parent_inverse:Transform,  // likewise
    // ...
}

Then every time something moves, we update the transforms and inverse transforms of its children (and so on, recursively). We don't necessarily want to nest GameObjects like this, though—we'll end up chasing pointers all over memory because Vecs are heap-allocated. We could avoid this with a crate like smallvec which uses inline space for small-enough vectors and spills to a new allocation otherwise, but there are other reasons we may not want a tree set up this way. Another option is to store a Vec<GameObjectRef>, and look them up as needed. In this situation we'd also really like to make sure parents are always processed during a frame before children, and that the parent can't change position after its children are processed, or else we'd need to invalidate the cache or propagate those changes back through.

Instead of storing who-knows-how-many child pointers, we could instead store a single parent pointer and compute these matrices as needed:

struct GameObject {
    // Maybe should be an Option, since root won't have a parent, or could use self?
    parent:GameObjectRef, // Or an Rc<RefCell<GameObject>>, but that's déclassé
    local_transform:Transform, /* e.g., Isometry3 */
}

But then it becomes expensive to iterate through the children of an object. Maybe you want both child and parent pointers! We don't need to worry about parents moving after children are moved if we compute these transforms lazily, so maybe it's okay not to cache them, or only to cache them within a single function. Just remember that you will eventually need to compute the local-to-global transform matrix for every object that needs to be rendered.

We could store this hypothetical cache globally rather than in each GameObject:

struct Engine {
    local_to_global:Vec<Isometry3>, // Assume a GameObjectRef is basically a usize
    global_to_local:Vec<Isometry3>, // and we can index into these vecs
    objects:Vec<GameObject>
}

We'll use a trick something like this for skinned vertex animation later on, since an animation pose will be fixed during rendering and we have that strong guarantee that computing a transform won't be invalidated by a later movement.

Another way to relate parents and children is to separate the link graph from the objects themselves:

struct Engine {
    objects:Vec<GameObject>,
    parents:Vec<GameObjectRef>,
    children:Vec<SmallVec<[GameObjectRef;4]>>
}

One issue with any of these vec representations is that reparenting an object can be expensive—you have to update links, and if you need to process parents before children you may need to reorder a lot of stuff in the list. Still, reparenting is rare relative to "iterating through all the objects", so this tradeoff is usually worth it.

Arenas

OK, but vecs are still a little unpleasant. If you remove objects from the game, all the IDs will change, and if you use Vec<Option<GameObject>> that has its own space and time drawbacks. You could use a HashMap of GameObjectRef or usize or whatever, but there's overhead associated with inserting, retrieving, and iterating over values. At this point we can bring in a new crate: thunderdome. Thunderdome is an arena allocator that will let us grab new things as we need them and reuse their storage when we're done with them.

[dependencies]
thunderdome="0.5.0"
use thunderdome::{Arena,Index};

pub struct GameObjectRef(Index);
pub struct TransformParent {
    parent:GameObjectRef,
    local_to_global:Isometry3,
    global_to_local:Isometry3
}
struct Engine {
    objects:Arena<GameObject>,
    // just an example: we'll store parent links out of band to illustrate,
    // and make the indices line up.
    parents:Arena<TransformParent>
}

impl Engine {
    pub fn add_object(&mut self, trf:Isometry3, parent:Option<GameObjectRef>) -> GameObjectRef {
        let obj=self.objects.insert(GameObject{transform:trf});
        if let Some(parent) = parent {
            // You'd use similar code to the below when updating an object's transform.
            let parent_go = &self.objects[parent.0];
            let parent_trfs = self.parents.get(parent.0);
            let local_to_global = parent_trfs
                .map(|p| p.local_to_global * parent_go.transform)
                .unwrap_or(parent_go.transform);
            let global_to_local = parent_trfs
                .map(|p| p.global_to_local * parent_go.transform.inversed())
                .unwrap_or_else(|| parent_go.transform.inversed());
            self.parents.insert_at(
                obj,
                TransformParent{
                    parent,
                    local_to_global,
                    global_to_local
                }
            );
        }
        GameObjectRef(obj)
    }
    pub fn remove_object(&mut self, go:GameObjectRef) {
        self.objects.remove(go.0);
        self.parents.remove(go.0);
    }
}

You could also use arenas for textures and meshes and similar assets!

This doesn't give us "parents before children" iteration, but we could regain that by making our parents storage a little fancier:

struct Engine {
    // ... instead of =parents=, store the tree "unfurled" into depth 0, depth 1, depth 2... etc.
    // This uses an arena so we can use GameObjectRefs to find an object at the level of the hierarchy
    // where it lives, but a Vec would be okay too especially if this hierarchy structure is
    // reinitialized every frame.
    hierarchy:Vec<Arena<HierarchyEntry>>
}
struct GameObject {
    hierarchy_level:usize,
    //...
}
struct HierarchyEntry {
    parent:GameObjectRef,
    // reminder: the smallvec crate gives us a small array which can 'spill' into a vec if it grows
    children:SmallVec<[GameObjectRef;4]>,
    // maybe:
    local_to_global:Transform,
    global_to_local:Transform
}

Now, when an object is added to a parent, we find the hierarchy level and entry for the parent, set this object at one higher level and add it to the hierarchy arena at the corresponding index. Reparenting is achieved by updating the hierarchy entry and (if necessary) removing it from the current level and adding it to the new level, recursively updating the levels of any children of the reparented object. This lets us process our nodes in something like a breadth first order, and if objects are allowed to modify other objects we can catch moments where a node's transform is modified and fix things up neatly.

Spatial Partitioning

Regardless of how the hierarchy is represented, it gives us a few useful tools for splitting space. First off, we can find a bounding volume (for example, an oriented box or a sphere or a cylinder) for each object, and take the unions of those volumes as we go up the hierarchy. That means that we know, at every level of the hierarchy, what sub-graphs may be touching which other sub-graphs, which is useful for making collision detection efficient. This can be less helpful if objects in the scene are positioned very far from their parent, but it's still better than nothing!

We can also decide what objects to render using the scene graph: by projecting a volume (a frustum) out from the camera, we can intersect that volume with nodes in the scene graph to see if their bounding volumes intersect with the frustum. If not, we can skip those nodes completely!

There are other ways to partition space as well (octrees, grids, binary space partitioning, and more), and we may talk about those when we address collision later.

Many Worlds

Regardless of how scenes are presented, it may make sense to allow for more than one scene simultaneously. For example, you might want to load and unload "scenes" as the player moves through a large environment. If objects in different scenes can interact, this should be done with care, but if objects only rarely move between scenes and don't interact between scenes then managing multiple scenes can be more efficient than using one giant world.

struct Engine {
    scenes:Vec<Scene>,
    // ...
}
impl Engine {
    // update and render each active scene
}
// Sometimes these are called "Worlds" instead
struct Scene {
    objects:Arena<GameObject>,
    // ...
}