Now a scissor (a clipping region) can be specified, we don't have to
worry about the rendering results out of the specified region.
Replace the implmenetation of the Fill with just a DrawTriangles with
an empty white image.
As a side effect, SubImage is avilable for Fill.
Fixes#1416
Instead, the callers (ebiten.NewImageFromImage and
(*ebiten.Image).ReplacePixels) have responsibility to copy the
pixels now. This change should reduce unnecessary copying pixels.
Updates #1222
This CL fixes the race condtion on delayedCommands, which can be
accessed and set to nil at the same time.
This CL separates some operations for delayedCommands into slow-
paths and fast-paths, and use mutex only at slow-paths for
performance. The implementation is based on sync.Once.
Fixes#1195
This change creates a new buffered.Image even for a sub-image. This
can increase a memory usage a little, but decrease the GPU memory
usage since only the necessary pixels are allocated on a texture
atlas.
Fixes#896
Updates #1194
This change also enables to remove the optimization at
(*buffered.Image).ReplacePixels.
// This commit w/ the optimization
BenchmarkImageDrawOver-8 60225 19241 ns/op
// This commit w/o the optimization
BenchmarkImageDrawOver-8 66567 17700 ns/op
// The previous w/ the optimization
BenchmarkImageDrawOver-8 62355 19580 ns/op
// The previous w/o the optimization
BenchmarkImageDrawOver-8 54460 22768 ns/op
Updates #1137
(*Image).At can be unnecessarily slow since this tries to get
pixels from GPU. This change reduces the chance to read GPU by
using its pending pixels when possible.
Fixes#1137
This simple change brings my simple test project from 752 allocations per frame to 474 allocations per frame. It seems a shame that go's escape analysis is not smart enough to leave this variable on the stack.
GeoM is a 24-byte struct and there is a slight perf difference that we are storing it in stack, but also copying it around with this change (instead of an 8-byte pointer). This could make things faster (due to stack / CPU cache) or slower (due to copying more memory) - when I try a stress test (drawing 100K images per frame), I can't see any actual performance difference (but I do see 100K fewer allocations, and GC is no longer running almost all the time).