pyglet week 2: Better Vertex Throughput

In last week’s 2D Graphics With pyglet and OpenGL, I used the pyglet library to produce some OpenGL triangles on the screen, from my rough-and-ready Python code. This week, I want to try to boost the throughput, to get some idea of how complex a scene we can realistically render from Python, while still maintaining a decent frame rate.

I was a little optimistic in my assessment of how fast last week’s code was running. When I come to measure it carefully, I find that displaying just 85 triangles will bring the framerate down to a minimally acceptable 30fps. This is on my lappy – a Thinkpad T60, with a dual 1.6GHz cores, only one of which is busy, and an ATI Radeon Mobility X1400 running at 1680×1050. The framerate seems fairly independent of what size the triangles are, and of whether blend is enabled to make them translucent.

So what can we do to improve this? I suspect that an easy win would be to replace each entity’s single triangle with a collection of triangles, specified by an array of vertices. To assemble the vertex list, we create the first vertex at (0, 0), and then lay all the following vertices in a ring around it.

Seven verticesFive triangles

I’ve shown vertex 6 lying adjacent to vertex 1, just to make them both visible, but in actuality they are coincident. Rendering these N vertices using glDrawArray() can produce N-2 triangles in the best case. All these vertices are shunted to the graphics card, translated, rotated, scaled and rendered in hardware, all without our code having to do any extra work, and hopefully without any significant performance penalty.

Starting with the code from last week, I modify it to generate the vertex list using the following new static member on class Entity. Note that I have coined the term shard to describe the individual triangles rendered by class Entity:

class Entity(object):

    numShards = 5
    vertsGl = None

    def _generateVerts():
        verts = [0.0, 0.0]
        for i in range(0, Entity.numShards+ 1):
            bearing = i * 2 * pi / Entity.numShards
            radius = (2 + cos(bearing)) / 2
            x, y = Position.CoordsFromPolar(radius, bearing)

        Entity.vertsGl = (GLfloat * len(verts))(*verts)


The for-loop simply creates the list of vertex co-ordinates, as illustrated above. The cryptic-looking penultimate line converts that list into an array of GLfloats, as provided by ctypes, and stores that array on a class level attribute, Entity.vertsGl. The final line then calls this member function as soon as the class is defined, creating our vertex array at program startup. We also create a similar array of colors, which will be used to color each vertex, but since I want each fan drawn in a different set of colors, this is done in Entity.__init__(), and the resulting arrays are stored on the instance (not shown).

This vertex and color arrays can then be rendered as a triangle fan using the following Entity.draw() method:

    def draw(self):
        glTranslatef(self.pos.x, self.pos.y, 0)
        glRotatef(self.pos.rot, 0, 0, 1)
        glScalef(self.size, self.size, 1)

        glVertexPointer(2, GL_FLOAT, 0, Entity.vertsGl)
        glColorPointer(4, GL_FLOAT, 0, self.colorsGl)

        glDrawArrays(GL_TRIANGLE_FAN, 0, len(self.vertsGl) // 2)

With other minor tweaks to give a new background color, running this with 3 shards per Entity produces quite a pleasing effect:

3 shards per entity

At 30fps, we can still manage 85 entities, and we’re now rendering a fan of three shards for each one, so we’ve tripled our throughput to 225 triangles per frame. I suspect it can get better though. Let’s try cranking up the number of shards per fan, while reducing the number of fans to maintain 30fps:

per fan: 7 shards 20 shards 400 shards 1,200 shards 12,000 shards
7 shards 20 shards 400 shards 1200 shards 12000 shards

Above about 200 shards per fan, the shards start getting so thin that they produce moire effects, and above 10,000 there’s some crazy white artifact starts happening in the middle of the fans. But nevertheless, the times taken to render these frames show a strong trend:

per entity
at 30fps
per frame
3 85 225
7 85 595
20 85 1,700
100 82 8,200
400 68 27,200
1,200 48 57,600
1,800 39 70,200
3,000 29 87,000
6,000 17 102,000
12,000 10 120,000
100,000 1 100,000

Fewer fans, each with more shards, results in much higher triangle throughput – up to 120,000 triangles per frame. Although it’s exciting to see such high figures, I’d almost rather it wasn’t the case – I’d prefer to create a game with more independent entities wandering around, regardless of how little graphical detail they could be adorned with. But there you have it, blame John Carmack. Anyhow, it’s clear that we can deliver sufficient graphical grunt to put together some sort of game. Next time I hope to make a start on putting all these triangles to good use.

Update: For a 500% performance boost when running under Linux, invoke Python with the -O flag. I can now get 500 fans on screen, each with 100 triangles, at 30fps. See comments below.

On to Part 3 – Some Pretty Flowers…

Download the source


20 thoughts on “pyglet week 2: Better Vertex Throughput

  1. Pingback: » Pyglet week 3 : Some Pretty Flowers

  2. @Richard: Ah, now I come to re-read, no doubt that’s what you were indicating on your original post with the phrase ‘including mutability’. :-)

  3. @Tartley: yes you can mutate the contents of a VBO, and the pyglet wrapping around them makes it really nice (though again since this hasn’t been released there’s no docs yet, but there are examples).

  4. Updates

    1) Alec is right.

    Invoking python with ‘-O’ on Windows gives me 10-20% speed improvement, which is fantastic.

    On Linux, however, it gives me over 500% performance improvement, just as he promised, which is just astonishing. Best of all, the performance gains are greatest in the cases with many entities on screen. I can now render 500 entities, with 100 triangles in each, at 30fps, on a shonky laptop. Absolutely brilliant.

    2) Use a seed.

    If you’re generating positions and sizes randomly like I am, and hoping to compare performance from one run to the next, then call random.seed(0) at the start of your program. Without it the varying size and placement of the fans was peturbing performance by a significant amount from one run to the next.

  5. If I’m drawing two entities from the same set of vertices, ie, the same as:


    Can I batch these up into a single glDrawElements() call (or one of its sibling functions glMultiDrawElements(), etc)

    I don’t know how to include transformation changes into a single batched call. Perhaps it can’t be done. Can anyone advise on approaches?

  6. @Richard: I don’t even know what a VBO is! Oh, hang on, is it a vertex buffer object? Which stores the vertices on the graphics card, rather than sytem memory? That was the next chapter I was going to skip to in the red book. Will using these cause me problems if I want to animate the vertices? For example, imagine we’re drawing a pac-man style ghost, and I want to independently animate the eyes (looking at pacman), the mouth (sometimes frowny), and the bottom edge (constantly rippling). What are my options? I can see a few, but they all seem to have drawbacks. I’ve clearly got some reading to do.

    @theatrus: Thanks, so I’m beginning to glean.

    @Alec: Brilliant, thanks for bringing that to my attention, I’ll try it out on my setup, see what happens.

  7. @John: Brilliant, thanks very much indeed for that. I took the liberty of editing your post to put the code in (pre class=”prettyprint”) tags, which preserves whitespace and colors by syntax.

    I wasn’t too worried by the performance of generateVerts(), since it only gets called once on program startup – but obviously the changes you suggest will be very handy if I start animating vertices with the CPU and having to re-generate the vertex lists, which is something I’m really looking forward to looking into.

    Obviously the sort of techniques you suggest will be relevant elsewhere too. I’ll roll the ideas into my future investigations.

    Ah, and only this very morning I was wondering hether NumPy could operate on arrays which would be useable as input to OpenGL routines. I am *delighted* to hear that it can. I will definitely be trying this out, since it will give me a feasible method to move vertices around dynamically. I’m keen to produce dynamic shadow sort of effects, and to indicate in-game entities state by changing their appearance, which can’t come simply from precanned vertex ordinates.

    Brilliant, thanks heaps.

  8. Pyglet 1.1 will include a graphics module which at its core provides a lovely Python wrapping around VBOs (including mutability). You can only get it via SVN at the moment. It’s been in there for a while now and is the core of pyglet’s new text rendering and sprite code.

  9. The trick with modern graphics hardware is not to reduce the number of triangles, it is to reduce the number of batches sent to the card.

  10. I just took a look at your code and I have a few comments. Python looks up variable names in the order of LEGB (local, enclosing, global, and builtins). So whenever you have a loop that is going to executed a lot of times it’s best to create a local variable to hold a reference to a variable outside of the local name space before the loop.

    Also, function calls in Python are fairly slow so any time you can reduce them in these loops the code will execute much faster. In the generateVerts method the append method of list is being called twice for each iteration but could be reduced to 1 call by using the extend method. With these 2 suggestions the _generateVerts will reduce it’s execution time by more than 40% in the case when numShards = 100,000.

        def _generateVerts():
            verts = [0.0, 0.0]
            _cos = cos
            CoordsFromPolar = Position.CoordsFromPolar
            tmp = 2 * pi / Entity.numShards
            verts_extend = verts.extend
            for i in range(0, Entity.numShards+ 1):
                bearing = i * tmp
                radius = (2 + _cos(bearing)) / 2
                verts_extend(CoordsFromPolar(radius, bearing))
            Entity.vertsGl = (GLfloat * len(verts))(*verts)

    These 2 techniques can also be applied to the rest of your code to speed it up. I have not done so with the rest of your code so I can not report performance gains it would have. You will have to try it for yourself. But you can gain even more performance if you start using numpy [1] arrays. Using numpy arrays you will be able to eliminate most of your loops as calculations on numpy arrays are performed on all elements and perform almost as fast as C code. You may also find some use for the scipy [2] library which takes advantage of numpy arrays.

    For a couple additional tricks see David Goodger’s Code Like a Pythonista: Idiomatic Python [3] as well as a few of the references he mentions.


  11. @Esteban: Thanks, that’s the sort of thing I was hoping might be true. Does anyone have any order-of-magnitude guesses of how fast code like the above should be in C? Dammit, I probably have some ten-year-old code lying around of my own that I could try out… (rummage, rummage)…

  12. Skim a good OpenGL book (e.g. the Red book or the Superbible) on display lists, vertex buffer objects, and texture objects. If you keep the data in the GPU and only ‘trigger’ it from Python code, there should be no noticeable difference between the time it takes to render a scene in Python/pyglet and what it takes in C.

  13. @Ezequiel: Glad you liked it, and I do plan to do more of the same, but be warned they will be interspersed between whatever other crap is on my mind at the time. Par for the3 course, I guess.

  14. @Dave: Thanks! I owned a copy of the red book years ago, when I was young and dynamic, but I’ve forgotten it all now, so I just purchased a copy of the shiny new edition. As far as I know, the technique above *does* use vertex arrays (or at least arrays of vertices.) Have I misconstrued something?

    I chose to draw fans using GL_TRIANGLE_FANS, instead of GL_TRIANGLES, since it can draw N-2 triangles from N vertices. Using GL_TRIANGLES can only draw N/3 triangles for the same bandwidth and GPU effort of transforming the same N vertices. Sounds worthy of investigation for comparison’s sake though. Presumably I’ll be using a mixture going forward.

    Also, I considered display lists, but the red book ‘hints and tips’ chapter enigmatically advises to prefer vertex arrays for performance. Can anyone shed any light on why?

  15. @Nick: God, it’s about time you realised it. I’ve been biting my tongue for years now. Don’t let the door hit you on the way out.

  16. I am clearly not suited to this industry.

    I must leave.

    I’m going to be a dustman, or sell hotdogs on the corner.

  17. I’ll add you to my rss reader.

    I’d like to read more about pyglet and python in general. And specially this kind of “performance investigations”

  18. Have you looked at the OpenGL Programming guide (the red book)?

    There’s probably a few other ways to allow more triangles – it’s been a while, but off the top of my head you could use GL_TRIANGLES for all of the entities, or you could set it up to work with vertex arrays, or, depending on what you want from the code, with display lists.

    A bit more work, but probably necessary if you’re trying to benchmark Pyglet. I’ve been meaning to have a bit of play with Pyglet myself, so the throughput is a data point I’m interested in.

  19. Pingback: » 2D Graphics With Pyglet and OpenGL

Leave a Reply