Affine: fullscreen picture zoom and rotation
Affine  Picture zoom/rotation/shear tutorial
Introduction
In this tutorial we will learn how to display a picture on the screen after an "affine transformation" including scale, shear and rotation. Different optimization techniques are used to speed up the display. Such an effect can be used in a game as a splash screen or as a rotating play field.
Download and install the .bin file to see live the different steps of the implementation (just press A/B to switch between steps).
Part 1: Setup
This first part is just about importing the picture into the sketch and display how many milliseconds are necessary for a draw with the drawImage
function of the Gamebuino library.
Import the picture
First of all we need a picture to display. I built the following picture based on graphics of the Gamebuino website (so it can be used for marketing, if anyone wishes, and there is no discussion about copyright!), but you can use any picture you want. To use the full set of optimizations presented in this tutorial, the picture needs to be a square of 256x256 pixels, but the technique can be adapted to other picture sizes (preferably powers of 2).
Rename the picture as "Picture.bmp" and then use the imagetocode converter to get a long text defining a data array and an Image object. The conversion may take some seconds to complete due to the size of the picture.
Create a new sketch (or use an existing one if you know what you are doing), and insert the code. Then add the classical Gamebuino code, and display the picture using drawImage
to see if everything is correctly setup.
The code should look like this (the line with PictureData
is truncated on purpose, otherwise it would take the whole page):
#include <GamebuinoMeta.h> const uint16_t PictureData[] = {256,256,1, 1, 0, 0, 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0..... [snip] Image Picture = Image(PictureData); void setup() { // Starts Gamebuino library and sets target frame rate gb.begin(); gb.setFrameRate(50); } void loop() { while(!gb.update()); gb.display.drawImage(0, 0, Picture); }
Please note that the setFrameRate
command sets already the target frame rate, which will be reached at the end of the tutorial.
At this point, executing the sketch should display the topleft part of the picture.
Measure the duration of the draw
We are looking for performance in this tutorial, so we need to measure how long it takes to draw the picture, and hence assess the effect of each optimization. This measurement is not necessary for the zoom/rotation effect, it is implemented only for didactic purposes. You can jump to the next chapter if you want.
To measure the time we use the function millis() of the Arduino library, which returns the number of milliseconds since the start of the current program.
Insert the measurement of the start time just under the while(!gb.update());
line:
// Gets the time at start of the computation unsigned long startTime = millis();
And insert the computation and display of the duration at the end of the loop()
function:
// Computes and displays duration unsigned long duration = millis()startTime; gb.display.clearTextVars(); gb.display.setColor(WHITE, BLACK); gb.display.print(duration); gb.display.print(" ms");
The number is in milliseconds, drawn on a nontransparent background for readability. Now the setup is completed, you should see something like this on the screen:
Time to jump to the next part and discover the draw concepts.
Part 2: Slow draw using floatbased computation
This step presents the general idea of the draw technique and a first, nonoptimized implementation.
General idea
To display a picture to the screen with arbitrary zoom and rotation, two families of algorithms can be used:
 Method 1: For each pixel of the source picture, we determine the position and size of this pixel on the screen and draw the pixel at this position
 Method 2: For each pixel of the screen, we determine the color of the pixel in the source image and draw it at the current position on the screen
The method 1 has some drawbacks:
 In case the image covers a few pixels on the screen, the algorithm will anyway look at each pixel in the source picture and often draw several times at the same pixel position on the screen
 If the picture is drawn with a zoom, then gaps will be visible because one pixel in the source picture would need to be drawn as a rectangle on the screen.
You got it, we will use the method 2.
The basic idea here is to start from a point at given coordinates within the picture, and to jump to the next point by adding constant increments to the coordinates.
More precisely, on the following diagram, the algorithm is represented for a screen of 6x5 pixels:
We start at point A and draw the color of A at (0,0) on the screen. Then, for the second point we jump to the next column from A to B and draw the color of B on the screen at (1,0). We do the same on the first row until we reach C, in turn drawn at (4,0) on the screen. Now to go to the next row, we come back to A and jump to D, and so on.
Please note that we have 3 coexisting coordinate systems in the algorithm: the screen, the source picture and the "virtual" picture we want to draw. To be clear in this tutorial we will name the coordinates:
 "column" and "row" in the virtual picture, so AB is the column increment and AD is the row increment
 Uppercase X and Y for the coordinates within the source picture
For commonality with the existing screen coordinate system, the other coordinate systems have the same orientation, i.e. origin at the topleft, X axis pointing to the right and Y axis pointing to the bottom.
The origin, column and row increments have two X and Y components in the 2D source picture, that we will name as following:

originX
andoriginY
for the origin, point A on the diagram 
columnIncrementX
andcolumnIncrementY
for the columns, in the diagram this vector is approximately equal to (3.2,1.2) 
rowIncrementX
androwIncrementY
for the columns, in the diagram this vector is approximately equal to (1.0,2.1)
Last thing to note: the increments have fractional values, not integer values. This is necessary to keep a good precision while drawing. In the next section, these increments will be encoded using the float
type.
Implementation with float variables
Let's implement the algorithm. First define constant values for the origin and increments in the loop()
function, just before the line with startTime = millis()
. The constant values are given here as an example to display a portion of the source picture:
// Coordinates of the first point to draw float originX=10; float originY=10; // Column increment float columnIncrementX = 3.2; float columnIncrementY = 1.2; // Row increment float rowIncrementX = 1.0; float rowIncrementY = 2.1;
Then replace the line gb.display.drawImage(0, 0, Picture);
by the following lines to implement the complete algorithm with floats. The variables startX
and startY
contain the coordinates of the first point of a row, X
and Y
the coordinates of the current point within the source picture and px
and py
the coordinates of the current point on the screen:
// Inits the counters and performs loop float startX=originX, startY=originY, X, Y; for(int py=0; py<64; py++) { X=startX; Y=startY; Color background = gb.createColor(0, 0, py*4); for(int px=0; px<80; px++) { int intx = (int)X; int inty = (int)Y; if(intx<0  intx>255  inty<0  inty>255) gb.display.drawPixel(px, py, background); else gb.display.drawPixel(px, py, Picture.getPixelColor(intx, inty));X+=columnIncrementX; Y+=columnIncrementY; } startX+=rowIncrementX; startY+=rowIncrementY; }</pre>
First, as you can see, we simply use the functions
drawPixel
andgetPixelColor
to transfer the color of the pixel from the source picture to the screen (more on that in the next part).Second, before getting the pixel color from the source picture, the float coordinates are converted to integers using a cast
(int)X
, and the resulting position is then checked to ensure that it is in the source picture. If not, a simple shade is displayed as background color. As a remark, if we do not check the position, then at some point a fatal error is raised because thegetPixelColor
function tries to pick a pixel in some random memory area that is not accessible to the processor.Time to execute and have a look at the generated picture, which should look like this:
Part 3: Animation and first optimizations
At this point we are able to draw the picture according to constant increments. In this part we will add animation and start to optimize.
Animation
Well, now we can draw the picture, it is not very quick but it works (trust me, it will get faster in the next parts). We know that we can modify the origin and the increments vectors to change the viewpoint of the observer, but how?
First of all, we want to center the draw on the center of the source picture. The vectors are interdependent so we will need some basic equations (maths are useful after all!).
More precisely, when we draw the pixel at the center of the screen at (40,32), we should be at the position (128,128) in the source picture. According to the draw algorithm (look at the diagram), this point is reached on the row 32 (so after adding 32 times the row increment vector from the origin) and the column 40 (so after adding 40 times the column increment vector from the origin).
In the form of equations projected on the X and Y axes, it looks like:
 originX + 40*columnIncrementX + 32*rowIncrementX = 128
 originY + 40*columnIncrementY + 32*rowIncrementY = 128
The resolution to compute the origin from the increments is immediate:
 originX = 128  40*columnIncrementX  32*rowIncrementX
 originY = 128  40*columnIncrementY  32*rowIncrementY
Now we have the origin and need to compute the increments. We want a simple rotation, i.e. we have to define rotating increments vectors, and such that the column and row increment vectors are perpendicular.
Classically, the trigonometric functions sine and cosine are used for this purpose. These functions oscillate between 1 and 1 with a period of 2*PI.
To draw a circle centered on the origin, if t is varying between 0 and 2*PI, the successive points (x,y) on the circle of radius R are located at the coordinates:
 x = R*cos(t)
 y = R*sin(t)
The increments vectors will be computed exactly as a point on a circle (the vector from the center to the point on the circle is indeed rotating as t increases):
 columnIncrementX = R*cos(t)
 columnIncrementY = R*sin(t)
For the row increment vector, this is the same with an added angle of 90 degrees, i.e. PI/2 in radians:
 rowIncrementX = R*cos(t + PI/2)
 rowIncrementY = R*sin(t + PI/2)
We will define a radius of 1 and use as variable varying with the time a new variable angle
incremented by 2*PI/100 at each frame, hence after 100 frames the cycle will be completed.
The basics are known, so let's go to the implementation.
First define the new global variable called angle
before the loop()
function:
// Increasing angle to set observer's position float angle=0.0;
The angle needs to be increased at each frame. Add these lines at the beginning of the loop()
function:
// Increments angle angle = angle + 2*3.1416/100;
Rewrite the following lines such that the increments follow circles of radius 4 (the origin being computed as described above):
// Increments to go to next column (increment on x on drawn image) float columnIncrementX = 4*cos(angle); float columnIncrementY = 4*sin(angle); // Increments to go to next row (increment on y on drawn image) float rowIncrementX = 4*cos(angle+3.1416/2); float rowIncrementY = 4*sin(angle+3.1416/2); // Coordinates of the first point to draw float originX=12840*columnIncrementX32*rowIncrementX; float originY=12840*columnIncrementY32*rowIncrementY;
With these settings, the animation should look like this (by the way for the animated gifs of this tutorial, ezgif.com was used to scale the images 3 times and adapt the animation speed to be near what can be seen on the Gamebuino):
Finally, to make the animation a little more interesting, let's add an oscillating zoom and a 3Dlike effect. We will define a new zoom
variable that oscillates with a frequency different than the one for the rotation (here we just use a sine function with the angle
divided by an arbitrary value). The variable zoom
is in turn used as a factor on the length of the row and column increments, such that it gives an effect of zoom in/out oscillation. In addition, we reduce the radius of the column increment to give a 3D isometric effect.
For the implementation, add the zoom
definition and rewrite the lines defining the increments:
// Computes zoom level float zoom = 2.2 + 2*sin(angle/1.618);// Increments to go to next column (increment on x on drawn image)
float columnIncrementX=zoom2cos(angle);
float columnIncrementY=zoom2sin(angle);
// Increments to go to next row (increment on y on drawn image)
float rowIncrementX=zoom3cos(angle+3.1416/2);
float rowIncrementY=zoom3sin(angle+3.1416/2);
Do not hesitate to experiment a lot with these settings! You can change the factors or the functions to create different effects.
Now the animation should look like this:
Well, not bad after all, but still not very fast to draw. Optimizations will be introduced in the next section.
Optimization of getPixelColor
The idea of the first optimization is to replace the getPixelColor
function by a direct access to the memory. We already check that the coordinates are in the range [0..255], so it is safe to pick the pixel color directly.
The data of the source picture is stored in the flash memory as a constant array PictureData
. It starts with 6 uint16_t
values that we do not need, after this header it is a onedimensional array. A pixel at position x,y is stored at the offset 6+y+x*256 (the multiplication by 256 can be written as well as a bit shift of 8 bits).
So rewrite the following line in the draw algorithm to replace the getPixelColor
function:
gb.display.drawPixel(px, py, (Color)PictureData[6+intx+(inty<<8)]);
The cast to (Color)
is necessary as a formal type conversion, there is no data conversion.
Once optimized, the result is not significantly quicker, let's implement another optimization.
Optimization of drawPixel
Here again, we want to replace a function by a direct access to the memory. Looking at the header file of Image, it appears that the picture data is contained in a public member of the Image
class called _buffer
.
This member has the type uint16_
t*
i.e. it is a pointer on an array of Color
pixels. Unlike the static Image
object, this array of pixels has no header (dimensions of the picture are placed in actual members of the class). The modification is then straightforward.
Rewrite the following lines within the draw algorithm to replace the two occurrences of drawPixel
:
if(intx<0  intx>255  inty<0  inty>255) gb.display._buffer[p++] = (uint16_t)background; else gb.display._buffer[p++] = PictureData[6+intx+(inty<<8)];
Please note that this optimization is clearly to be put in the "danger zone". As far as I can see, the _buffer
member is not officially documented and may change without notice, leading to potential incompatibilities in future versions of the Gamebuino API (nevertheless a compiled .bin would work further). This is the price to pay for performance.
Once optimized, the result is a bit quicker:
Part 4: Optimization with fixedpoint arithmetic
In this part, we will change small parts of the draw algorithm to increase dramatically the speed.
Simple 32bits fixedpoint arithmetic without bit shift
The next optimization is more tricky. It relies on fixedpoint arithmetic, which is a generalpurpose technique to perform computations with fractional numbers, but without using the floatingpoint types float
or double
. The same technique is used, probably among many other creations, in Fractalino and in Raycaster.
Understanding the bits and bytes of this technique is not completely mandatory here, and this section much deeper than necessary for the optimization, so you can directly go to the implementation in the next section if you wish.
With fixedpoint arithmetic, all fractional numbers are multiplied by a common factor and are stored as integers.
For example, in base 10, let's say that we want to add a=1.3 and b=2.6 taking 10 as our common factor. First convert a and b by multiplying them with the common factor: A = a * 10 = 13 and the same for B = 26. Once converted in fixedpoint representation, we just add A and B as C = A + B = 13 + 26 = 39. Finally, we convert the result back as c = C/10 = 3.9. Here it is easy to see that additions or subtractions of numbers multiplied by a same factor will lead to correct results, we apply just the distributive property.
In base 2 (binary), powers of 2 are used as common factors such that multiplications and divisions can be done as bit shifts. In our examples, we will not use bit shifts, preferring carefully selected access to bytes or words, which are equivalent to multiplying/dividing by factors of 256.
Namely, the processor of the Gamebuino can process 32bits signed or unsigned integers, and it is always possible to access parts of the 32bits word, either in two 16bits words or in 4 bytes.
As an example, if we encode the value 1234.5678 on 32 bits with 16 bits after the dot, the factor is 2^16 = 65536 = 256*256.
Then 1234.5678 * 65536 = 80908635.3408. The last digits after the dot (.3408) will not be encoded, so we look at the value 80908635=0x04D2915B
.
At this point it is very important to mention that the processor of the Gamebuino uses a little endian encoding. It means that our value is represented in memory in reverse order i.e. as the sequence { 0x5B, 0x91, 0xD2, 0x04 }
.
The different views of the 32 bits in memory are represented below (unsigned values only):
    Hexadecimal  5B  91  D2  04      4 bytes  091  145  210  004    2 x 16 bits  37211  1234   1 x 32 bits  80908635  
Logically, we find the value 1234 (integer part of the value 1234.5678 we encoded) as the second 16bits integer, and the last byte (here 004) is not zero because 1234 cannot be encoded as one byte (greater than 255). This will be useful to determine if a coordinate is still in the range [0..255] i.e. within the source picture.
The same kind of decomposition can be done for a negative value. Let's look at a new example with the negative value 123.456 i.e. 123.456 * 65536 = 8090812.416. We round the value giving 8090812 = 0xFF848B44
encoded as signed integer.
To give a complete overview (we won't look at all numbers!), the different interpretations of the 32 bits according to different data types, signed or not, are depicted below for 0xFF848B44
encoded in littleendian:
    Hexadecimal  44  8B  84  FF      as uint8_t  068  139  132  255      as int8_t  188  117  124  001    as uint16_t  35652  65412    as int16_t  29884  124   as int32_t  8090812  
The interpretation is complicated in this case. The main thing we need to check is the second int16_t
equal to 124. We encoded 123.456, so we could expect 123 here. This difference of 1 is strange at first sight but actually logical.
To try to understand, let's encode 0.0 and decrease it by 0.2 to obtain the encoded version of 0.2, i.e. A = 0x00000000
, 0.2 * 65536 = 3276.8 so after rounding B = 32769 = 0x00008001
, then A  B = 32769 = 0xFFFF7FFF
. We see that to encode 0.2, the mostsignificant byte of the integer part is equal to 0xFF
= 1 (second byte here, would be the third byte in littleendian), not zero although we encode 0.2. Intuitively it cannot be 0 because it would mean that +0.1 and 0.1 would be encoded with the same mostsignificant byte.
Actually it can be interpreted as following: when a value W is separated into its 2 parts W = M*256 + L, the mostsignificant part M is the next lower multiple (of 256), and the leastsignificant part L is added to it.
It is obvious with positive values, e.g. for 260 the next lower multiple is 1*256 (so M=0x01
) and the value 4 is added (so L=0x04
, A=0x0104
).
This is the same with negative values, e.g. for 260, the next lower multiple is (2)*256=512 (2 encoded as M=0xFE
) and the value 512260=252 (L=0xFC
) is added to it (so M=0xFEFC
). Strange but it works.
To come back to our first example, the value 0xFF84
=(1)*256+132=124. To reconstruct the number correctly, we have to interpret the mostsignificant byte as signed and the leastsignificant ones as unsigned. The same to reconstruct the fractional number, we interpret the integer part as signed (mostsignificant) and the fractional part as unsigned: 0xFF848B44
/65536 = 124 + 35652/65536 = 123.45599 (precision was lost at the beginning while rounding).
Fixedpoint arithmetic work as well for multiplication and division, but it is another story. The main limitation of the technique is of course that it will completely fail if the result of some operations exceeds the maximum value (here 32768 or +32767), even temporarily.
Implementation of the optimization with fixedpoint numbers
After the long deepdive into fixedpoint arithmetic of the previous section, let's summarize and remember that for fixedpoint numbers on 32 bits, 16 bits after the dot:
 We define the factor F = 256*256 = 65536 (it corresponds to a shift of 16 bits due to the 16 bits after the dot)
 A negative or positive float n is encoded as fixedpoint using
N=(int32_t)round(n*F)
 Numbers encoded as fixedpoint can be added or subtracted just like large integers, resulting in encoded numbers (actually all operations are possible with more complex formulas)
 An encoded number is decoded back to a float with
n=(float)N/(float)F
 To get the fractional or integer parts directly from an encoded number, thanks to our choice of 16 bits after the dot, we can just access the data differently, i.e. as 16bits integer or bytes, instead of performing conversions or bit shifts
 When accessing different parts of the 32 bits, it is important to interpret the mostsignificant part as signed integer, and the leastsignificant part as unsigned integer, due to the encoding of negative values
 Data is encoded in "littleendian" on the Gamebuino, i.e. the number
0x01234567
is represented in memory as{ 0x67, 0x45, 0x23, 0x01 }
In C and C++, the data structure union
can be used to access the same memory location with different types.
Insert the following global type definition, which gives a full access as different types (not all types will be used):
union FixedPoint { int32_t asInt32; int16_t asInt16[2]; uint16_t asUInt16[2]; int8_t asInt8[4]; uint8_t asUInt8[4]; };
Then, rewrite the complete draw code as following, first with the definition of the counters and then the conversion from floats:
// Defines the increments as FixedPoint types and convert values from floats FixedPoint columnIncrementFPX, columnIncrementFPY, rowIncrementFPX, rowIncrementFPY; columnIncrementFPX.asInt32 = (int32_t)round(columnIncrementX*65536.0); columnIncrementFPY.asInt32 = (int32_t)round(columnIncrementY*65536.0); rowIncrementFPX.asInt32 = (int32_t)round(rowIncrementX*65536.0); rowIncrementFPY.asInt32 = (int32_t)round(rowIncrementY*65536.0); // Inits the counters as FixedPoint types and converts values from floats FixedPoint startX, startY, X, Y; startX.asInt32 = (int32_t)round(originX*65536.0); startY.asInt32 = (int32_t)round(originY*65536.0);
Second, rewrite the main loop as following:
// Performs loop int p=0; for(int py=0; py<64; py++) { X.asInt32 = startX.asInt32; Y.asInt32 = startY.asInt32; Color background = gb.createColor(0,0,py*4); for(int px=0; px<80; px++) { int intx = X.asInt16[1]; int inty = Y.asInt16[1]; if(intx<0  intx>255  inty<0  inty>255) gb.display._buffer[p++] = (uint16_t)background; else gb.display._buffer[p++] = PictureData[6+intx+(inty<<8)];X.asInt32 += columnIncrementFPX.asInt32; Y.asInt32 += columnIncrementFPY.asInt32; } startX.asInt32 += rowIncrementFPX.asInt32; startY.asInt32 += rowIncrementFPY.asInt32; }</pre>
Note that:
 The coordinates within the picture,
intx
andinty
, are directly extracted from the integer parts (signed) of the current coordinatesX
andY
.  The addition of the increments are performed classically on 32 bits, adding the integer and fractional parts of the encoded numbers in one operation
The speed increase is astonishing:
Infinite Picture
A variant of the previous optimization can be done by changing the lines that define intx
and inty
, check the coordinates and write the pixels. Here we want to use a direct access to the data and simplify the current position check.
Rewrite these lines as following:
if(X.asInt8[3]!=0  Y.asInt8[3]!=0) gb.display._buffer[p++] = (uint16_t)background; else gb.display._buffer[p++] = PictureData[6+X.asUInt8[2]+(Y.asUInt8[2]<<8)];
Worth mentioning that we use directly the bytes of the integer part of the current position:
 There is no comparison any more of
X
andY
, we just use the mostsignificant byte and check if it equals zero:if(X.asInt8[3]!=0  Y.asInt8[3]!=0)
. This is only possible due to the choice of a picture of 256x256 pixels, if the mostsignificant byte equals zero the value if automatically positive and less than 256.  We use directly and only the leastsignificant byte, obviously in range [0..255], to compute the pixel position within the source picture.
The performance increase of this optimization is marginal, but it paves the way for another effect: as we now use directly an unsigned byte for each coordinate within the source picture, we are sure to be inside the picture so it is safe to remove the test. By doing so, due to the nature of the computation, it is like drawing the picture with coordinates modulo 256.
Comment out these lines as following:
//if(X.asInt8[3]!=0  Y.asInt8[3]!=0) //gb.display._buffer[p++] = (uint16_t)background; //else
At the end, the result should look like an infinite tiling of the source picture:
Conclusion
I hope you enjoyed this tutorial, learned the basics of displaying pixels after a simple transformation, and how to optimize such a draw technique. The result is very fast and could be used outofthebox for a game, as long as not too much additional data is necessary (the picture itself uses half of the available flash memory)!.
The strict affine transformation is completed. Possible next steps now would be to:
 Apply the same technique on a tilemap instead of a bitmap such that more flash memory remains available. With carefully chosen tile size and map size, bitshits can be used to build the location of the source pixel in a tile.
 Switch to highdefinition using the highdefinition display method by aoneill. At the end we are at about 34 milliseconds to draw. Optimistically a draw of 4 times the number of pixels in less than 20 milliseconds seems reachable.
 Add a variant of
drawImage
with integrated rotation and scale for any picture  Use 3D projection formulas and modify the increments at each drawn line to mimic the Mode 7 of the SNES. Actually, if you download and install the .bin file, there is a "hidden level" with 3D projection and animation (hopefully for a future tutorial).
That's all folks!