Robert Grzeszczuk (rg)
Wed, 23 Jun 1999 11:24:35 -0700 (PDT)
Thnaks for your feedback. Some comments below.
On Jun 23, 6:30pm, Manfred Weiler wrote:
> Subject: Questions concerning Volumizer design.
>
> In the Volumizer API different geometric primitives are supported. But for
> poligonalization all the higher level primitives are tesselated into
> tetrahedrons. This leads to a significant overhead for such primitives like
> hexahedrons which could be directly slices more efficient than the five
> tetrahedrons the tesselation produces. This applies especially to the
standard
> case where s single volume should be rendered with a TetraSet covering
exactly
> the bounding box of the volume.
> When will there be direct support to (some of) the vout-primitives in the
> future or why is this not planed.
> For the standard case it is perhaps a good idea to make it possible that a
> volume can be polygonizes without a TetraSet (NULL-Pointer).
There are several performance, algorithmic, architectural, engineering, and
philosophical issues here.
First, I wouldn't say that the overhead is "significant." Polygonizing a cube
tesselated into 5 tetrahedra coinciding with the bouding box of the volume
rarely takes more than 2 ms per brick even on a single processor (FSG version
may leverage multiple processors to cut down on host computation).
I would also disagree that a box can be tesselated significantly faster. In
the current 1.1 implementation of polygonize() uses a very efficient
"SweepingTetrahedra" algorithm, if certain conditions are met (e.g., a
tetrahedron falls entirely within the clip box of the brick). This procedure
takes advantage of the fact that a tetrahedron is subdivided into four distinct
spans (in a non-degenerate case) as a sweeping plane crosses each of the
vertices. In the first and last span the intersection of the plane with the
tetrahedron will be a triangle, in the middle--a quadrangle. All polygons
within a span are similar and can be obtained via interpolation. This is
*really* fast.
The same algorithm could be extended to "SweepingCubes" with up to 15 spans and
3-10 vertex polygons . However, this algorithm might be more involved to
implement and debug, particulalry given all the degenerate cases that need be
considered. Furthermore, chances are that the spatial coherence will be less
prominent (spans are shorter), and the benefit smaller. You may end up having
to use "MarchingCubes." In the end it may be a wash (we are open to hard
evidence, though).
The architectural argument for not supporting direct rasterization of
specialized cases is analogous to that adopted in OpenGL. For example, axis
aligned rectangles can be scan-converted significantly faster than any other
primitive. And they are quite common in 2D graphics. Yet, only tri-strips and
tri-fans are efficiently supported. It's even worse for higher order
primitives.
The engineering arguement is that it is best to develop, debug, and maintain a
minimal and complete set of functionality, all things (e.g., performance) being
equal. Currently, I see no reason to put extra effort into this problem
(performance for a single cube is certainly there).
Philosophically, having a special case, where a TetraSet is NULL, and the
volume is assumed to cover the whole BrickSetCollection breaks down the
Volumetric paradigm paradigm. Essentially, it would be like requesting that
glBegin()/glEnd() sequence that enclosed only texture coordinates and no vertex
coordinates, should use a heurstic for extracting them from the currently bound
texture. Nastiness, I'm telling you!
>
> On the other hand for most cases I see no need to create a (sometimes huge
and
> only parsely populated) transient data structure. Especially on low-end
> workstations without 3D texture interpolation this structure consumes much of
> the rare main memory due to the great number of "bricks". In addition to that
> I guess that working with this structure reduces performance because of quite
> a number of system calls (memory (de-)allocation, creating/deleting object on
> the stack).
The point about sparsity of the transient data structure is well taken. For
example, the case on N 2D images sampled with N planes, one only needs N
polygons. However, the current architecture requires N*N storage (each of the
N images/bricks has a separate list of FaceSets that is N long; the single
slicing rectangle is saved in FaceSetLists[N][N]). We have been evaluating
alternatives for a more efficient storgae of transient geometry. (However,
there are no system calls ar any allocation/deallocation that directly affects
performace).
One possible solution for the above problem we have been kicjing around is to
overlaod polygonize() to take an additional parameter: an array of offsets (int
offsets[N]). For each brick B, offset[B] would contain the position of the
first non-empty FaceSet. For example, consider a 2 brick volume stacked in Z.
Each brick is 128^3. the transient structure has [2][256] FaceSets. When
viewed along Z axis with isotropic sampling rate of 1.0, the first brick
currently fills [0][0-127], while the 2nd [1][128-255] leaving the remaining
FaceSets empty. the new approach would allocate only [2][128] FaceSets and
return offsets[2] = { 0, 128}. Draw routine could still process the polgons in
globally sorted order of required. In 2D case, that would allow for allocating
[N][1] FaceSets.
> I admit that using this structure is more flexible but I think that
> display lists could do the same job and the vertices and polygons that should
> be drawn are stored where they will be needed.
> Maybe you will object that the transient geometry can be cached between the
> frames. But this leads to errors in the resulting image when the view
> parameters change, so for correct images slicing has to be performed in every
> frame.
> So why not add a draw method to Volumizer which works totaly without
transient
> geometry?
>
As you point out the main advantage of this scheme is flexibility. For
example, one does not have to apply visibility sorting to the the tetrahedra
(they need to be drawn in order). This becomes particularly important in
muti-part rendering (suppose that you have a TetraSet describing the mandible,
and another that describes the rest of the skull; you need to sort them first).
Similarly, consider a multi-volume case, wher two mis-aligned volumes
intersect. In same cases (e.g., if the volumes are stacks of 2D images, not 3D
textures), you may have to polygonize them separately, and do a proper
visibility sort on the sampling geometry (e.g., form a BSP tree). Even in the
simplest case, of a hexahedron tesselated into 5 tetras, you need to
sample/draw the furthest tetrahedron first, before you start sample/draw for
the next.
So I gather, that your question is again "Why not have a special 'fast-path'
thaa only works on a single cubic volume?" The answer is: for pretty much the
same resons as listed above: it is not a minimal design, and the problems you
identified can be solved differently, without compromising the paradigm. For
the same reason, you cannot send a NURB surface down on OpenGL pipe (the pipe
would have to do tesselation as well as rasterisation, which complicates things
too much).
Again, thanks for you comments. We went through many of these issues a few
months ago, and none of the decisions were easy. If any of the above does not
make sense, please rebuttle. We've been known to change our minds ... :-)
-rg
>
> Manfred.
>-- End of excerpt from Manfred Weiler
This archive was generated by hypermail 2.0b2 on Mon Nov 01 1999 - 14:02:51 PST