To achieve top performance, you must use efficient geometry formats. Painting triangles with glBegin(GL_TRIANGLES) is simply not enough to achieve decent performance. Here are some geometry representation hints to speed up your application.
Use strips or fans as much as you can. They use less memory to represent the same amount of geometry. You can use utilities to stripify geometry for you, such as NVTriStrip from NVIDIA.
Use indexed primitives. This saves memory and avoids repeating the same vertices. A cube's memory footprint, for example, can be divided in two by indexing its faces properly. Creating indexed primitives is relatively straightforward as a preprocess: Get your incoming vertex list and every time you encounter a vertex you haven't encountered before, add it to the index table. Then, build the face loops by indexing them to the index list.
Use compact representations. For example, you can write:
Storing colors as doubles is a performance killer. Most cards use floats internally, so you are wasting bandwidth. The following call is even more inefficient:
Most computer screens work in 24-bit RGB color, so each pixel uses three bytes. Thus, a representation like the following offers the same visual quality while dividing memory use by a factor of four. Less memory implies less bus impact and higher performance.
Send as little data as possible. Imagine that you need to send a VA of 1000 triangles, all of them of equal color (white, for example). You can fill a color array with copies of that color or, even better, write:
glColor3ub(255,255,255); glEnableClientState(GL_VERTEX_ARRAY....); glDrawElements(...)
Repeated components do not need to be sent to the pipeline for each primitive. You can save lots of precious bandwidth by using a single call.
Another item to watch out for is geometry rendering. Some paths into graphics chips are faster than others, and thus knowing how to deliver geometry to the hardware efficiently is key to achieving top performance.
For example, immediate mode rendering (glBegin(GL_TRIANGLES)) is painfully slow when compared to faster alternatives such as VAs. Thus, immediate mode rendering should be avoided at all costs, even for simple interface elements.
You have three alternatives. First, you can use display lists. Display lists offer good performance, but list setup is a slow process, so data cannot be animated. But for static geometry (interfaces and menus are a good example), they are an excellent rendering method. Second, VAs allows you to modify geometry on the fly. VAs are stored in the application's memory space, so you should try to batch primitives together in sequences of at least 100 elements. Then, render those using VAs or display lists. On most cards, the "sweet spot" where you can achieve maximum rendering speed is around 1,000 triangles.
A third, more involved alternative is to create a complete memory manager and take advantage of any special hardware available. Try to allocate very frequent geometry in video memory and take advantage of AGP buses to transfer it efficiently.
Avoid Unneeded State Changes
OpenGL is very sensitive to state changes. Every time you turn lighting on and off, change the active texture map, and so on you are incurring a performance hit. Thus, a series of measures need to be taken to ensure your rendering pipeline is as efficient as possible.
First, you can sort your objects by texture and render them with an efficient method (display lists or VAs). When sorting by texture, you ensure the minimum number of texture swaps and also allow longer primitives.
Second, make sure you are rendering with the minimum state available. Blending usage, for example, should be kept to a minimum because it is computationally expensive. Render all blended primitives together and disable blending as soon as you are finished. The same applies to lighting or to any other rendering option that might slow down your code.
The key to performance under OpenGL is not really about painting triangles fast: It is about not painting triangles even faster. Determining what should really be drawn accurately is the best way to ensure good performance. Don't let the hardware clip geometry for you. To discard a triangle it will already have crossed the bus, wreaking havoc with your performance. Detect unseen geometry as fast as you can. You can perform hierarchical, per-object clipping, you can cluster-cull back-facing geometry, and some cards can even help you determine occlusions before the real geometry is actually sent to the pipeline. Examine your target platform closely, so you know how to avoid doing more work than is really needed. In Chapter 12, "3D Pipeline Overview," we discussed some popular methods to reduce the rendered data set.