Many of the performance tuning techniques discussed here (e.g., minimizing the number of state changes and disabling features that are not required) are a good idea no matter what system you are targeting. Other tuning techniques are specific to particular system. OpenGL implementations vary widely, so inexpensive commands on one platform may be expensive on another. For example, before you sort your database based on state changes, you need to determine which state changes are the most expensive for each system on which you are interested in running.
In addition, you may want to modify the behavior of your program depending on which modes are fast. This is especially important for programs that must run faster than a particular frame rate. Features may need to be disabled in order to maintain interactivity. For example, if a particular texture mapping environment is slow on one of your target systems, you may need to disable texture mapping or change the texture environment whenever your program is running on that platform.
Before you can tune your program for each of the target platforms, you need to characterize those platforms' performance. This is not always straightforward. Often a particular device is able to accelerate certain features, but not all at the same time. Thus it is important to test the performance for combinations of features that you will be using. For example, a graphics adapter may accelerate texture mapping but only for certain texture parameters and texture environment settings. Even if all texture modes are accelerated, experimentation will be required to see how many textures you can use at once without causing the adapter to page textures in and out of the local memory.
An even more complicated situation arises if the graphics adapter has a shared pool of memory that is allocated to several tasks. For example, the adapter may not have a framebuffer deep enough to contain a depth buffer and a stencil buffer. In this case, the adapter would be able to accelerate both depth buffering and stenciling but not at the same time. Or perhaps, depth buffering and stenciling can both be accelerated but only for certain stencil buffer depths.
Typically, per-platform testing is done at initialization time. You should do some trial runs through your data with different combinations of state settings and calculate the time it takes to render in each case. You may want to save the results in a file so your program does not have to do this each time it starts up. You can find an example of how to measure the performance of particular OpenGL operations and save the results using the isfast program on the web site.