DrawTriangles is expensive and slow because of massive memory
allocation and garbage collection costs. This patch moves from ~47TPS
on my laptop (with ~24k triangles) to 60TPS. The first part
is just allocating the right size of vertex buffer up front; that
got to about 55TPS. The second part replaces the frequent
allocations of []float32 in Vertex() calls with writing the
desired values into a provided destination slice.
Time spent in drawing triangles for 1,000 frames:
13.07s baseline
11.09s preallocate whole buffer to avoid resizing
6.13s use new PutVertex function
This might need some cleanup, but I think it's good evidence that
the design change is viable.