Is there a specific reason all world objects should be rendered by a single “main render-pass” node?
I think that the current encoder code should be interpreted as a refactor that allows us approach that problem at all, instead of a more-or-less final solution.
I see that the general idea here is to be able declare what the pass needs, and based on that, run only the encoders that provide that data. We could split the existing encoders into multiple parallel ones, but that can easily explode into too fine grained systems that could potentially block each other. Also as Viral noted, there might be not enough memory to fit it all at once. This might warrant some queuing, but IMHO we can ignore the problem for now. (you sure have more system RAM then your video RAM, right? ) I tried to implement those encoders as a layer above systems, so combining them could be possible, but failed due to some type system trickery around
Join. I’d be glad to try with you again as another iteration on top of what we have.
I have some ideas of what the data flow could have been, but existing shred implementation might be too limited to handle it. Specifically, encoding could be indeed separated per “data kind”, like colors, positions or any uniform/varying/const etc. data. There could be an encoder for “2d position”, “tint color” or “albedo texture”. I think the engine could have many predefines such “slots”, possibly allowing a
Custom(&'static str) enum variant for user defined things or something like that. Once the required layout is determined based on current render graph node, the encoders would be scheduled to run with specific buffer destination and stride. The tricky part - multiple encoders mutating the same buffer without locking (possible only due to stride). Next, the buffer would be post-processed for things like sorting (might also be done based on types - like depth sorting only affecting buffers with position), then the pass would just be handed final buffers that can be just straight copied into corresponding Randy objects. Due to buffers being cleared only once per frame, the same buffers can be potentially reused across many different passes.
It all might for sure provoke some changes in ECS.
I like the idea of parallel fine grained encoders but unable to see how they can cooperate.
They can’t just write data into
Vec linearly as data from same entity would spread in different indices.
Encoders writing data directly into gpu buffers and set descriptors would not work too for the same reason.
Imagine two types of objects. They rendered by two different pipelines with different layouts. Both need
Transform data, but first needs data from component
Foo and second needs data from component
The objects can be interleaved so
Transfer encoder would visit objects with
Foo and objects with
Bar. How then pass that knows nothing of
Bar types and encoders would know which objects data at which offset?
I imagine encoders to be not a systems but special handlers that fetch data from
World on demand.
Pass iterating through entities with
Renderable component, decides which encoders it needs and allocates ranges in buffers and descriptor sets based on data in
Renderable component (most data it should get directly from shaders attached to
Renderable). Then pass uses encoders to populate buffers and descriptor sets and records draw call into command buffer to which correct pipeline is attached. In next frame pass will not update buffers and descriptors for this
Entity if relevant components unchanged.
But maybe you have another solution on your mind.
I understand how iterating toward big goal is important, but we need this big goal well defined and possibly be on the same page. Otherwise we risk to go in wrong direction.
I’m going to try and summarize the discussions in this GitHub issue about Encoders
Summary of raised issues/concerns
omni-viral: Frizi’s Encoder in the PR is too specific and not data-driven enough. It’s not extensible: if users want to change the shader and add some data in World components, they would have to copy the entire built-in
Encoder, then add their fields to their custom implementation.
I think this is addressed with the design I propose below.
O(n)is not good enough. We should support only sending changed data to GPU every frame as
kis number of changes is usually order of magnitude smaller than
It’s still unclear how to do this generally and still maintain good per-element performance. I think it’s somewhat possible if we use
specs modification events but there would still be a level of indirection to map an entity to an offset in the GPU buffer, which is not necessary when using an
O(n) approach to rewrite buffers every frame with current entity set.
In my opinion we should start to make the
O(n) approach fast, with very low per-element overhead. This plays to the strengths of current CPUs where we can make obvious performance wins with linear access patterns and has consistent performance regardless of how the game modifies entities. It is probably also easier to implement.
I’ve given a go at attempting to design a somewhat data-driven approach to
World data extraction.
A few observations about the problem,
- We know the layout of Components at compile-time
- We don’t know the layout of Pipeline buffers that Component data will be written to at compile-time (without current Pass type for specific object kind + known shader struct)
- We want to move fields/data from Components to Pipeline buffers
I wrote the following example for a simple
Encoder implementation that is implemented for a set of component types.
Let me know what you think.
This function just returns pipeline from renderable. So how renderer knows that this pipeline and data this encoder encodes are compatible?
Encoders be combined with this approach? If yes then how?
Do you try every known encoder on every entity?
Some components contain resources, not copyable data to put into buffer.
This can only be known after running
This function links the
fields with shader metadata to know where to write data. It would probably return a Result instead.
I suppose? My thinking is that the
run_encoder function should run inside of a
join loop. So you can run any number of these encoders, is that what you mean?
Every registered encoder would run a
join loop based on the component set.
EncodeTarget supports any type, just a bit of “unsafe” code, but you can also write inner handles (the u32) with it. So resources would work, the ptr would just not point to a GPU buffer, instead some other place on the heap.