[-] Draw rotated pixels in src order -> cache write miss [X] Use atan2 at beginning and end of line. Interpolation in-between values [X] Test pixel perfect 0, 90
[X] Optimization for square images [X] Fixed point computation [-] -funroll-loops -> no gain [-] restrict qualifier -> unavailable in C++ [ ] All positions as simple integer
[-] Rotate per channel -> no gain [X] Cut image in tiles [X] Overlap [-] Rotate in one temp tile then copy/move it [X] Align tiles in memory [ ] Touch beginning of tile
[X] RGBX format (create pixel structure) on 8 bytes (can do computation in-place) [X] Load pixels in 64-bit variable [X] Directly load in SIMD 128-bit variable [ ] Align memory on 16 bytes (would require padding) [X] RGBX tiles
[ ] Pack 4 neighbors in 16B structure (aligned) Each point is followed by the point below [ ] Spiral layout?
[X] Interpolate using SIMD, SSE [ ] Fix out-of-bounds pixel set [X] Use padding [ ] Image borders