Opened 5 years ago

Closed 4 years ago

#16766 closed optimization (fixed)

Optimize drawing bitmaps with transparency on wxMemoryDC without transparency

Reported by: awi Owned by: vadz
Priority: normal Milestone:
Component: wxMSW Version: dev-latest
Keywords: alpha channel AlphaBlend AlphaBlt Cc:
Blocked By: Blocking:
Patch: yes

Description

Patch implemented in the function AlphaBlt to fix the issue (see #14403) with drawing true RGBA bitmaps over 32-bit RGB bitmaps (but with no effective alpha channel) was based on the concept that source RGBA bitmap is directly drawn on the target bitmap and afterwards alpha channel data are reset to fix the "garbage" introduced by AlphaBlend API.
Resetting alpha channel data is done through the "manual" iteration over all pixels in the destination bitmap. This iteration can be an expensive operation for large destination bitmaps and overall cost can be even increased if destination bitmap is a DDB which must be converted to DIB prior to the iteration. This is especially painful if several small RGBA bitmaps are drawn over the large destination bitmap because alpha channel data in the whole large destination area must be reset even if drawing area is small and the overhead coming from the cumulating iterations is significant.

Alternative method of this special drawing could be as follows:

  1. Create a temporary 24-bit bitmap with size the same as the destination drawing area.
  2. Copy to this temporary bitmap the content of the destination area (there is no data loss because destination is a 32-bit bitmap with no real alpha channel).
  3. Do actual drawing of the source RGBA bitmap on the temporary bitmap. Because temporary bitmap is 24-bit only, AlpahBlend API doesn't introduce any "garbage" to the output.
  4. Copy temporary bitmap back to the destination area.

This way there is not necessary to reset alpha channel after invoking AlpaBlend because it operates on the 24-bit destination.

Benchmark looks quite nice (executed in Debug mode):

200 iterations
target bitmap size: 1000 x 800

                                  Drawing time (ms)
source bitmap size:   16 x 16          400 x 400         800 x 800
algorithm:         current   new     current   new    current   new
--------------------------------------------------------------------
 src     dst
 RGB  on DIB 24        3       3      19        20        70     73
 RGB  on DIB 32        2       2      70        73       278    278
 RGB  on DDB           3       4      69        69       291    281

*RGBA on DIB 24        3       3      276      276      1105   1115
*RGBA on DIB 32     1949      25     2030      523      2456   2212
*RGBA on DDB        3134      26     3270      534      3734   2225

Note: Results not marked with '*' are presented for comparison purposes only (these drawings are not handled by AlphaBlt function).

The smaller the source bitmap the improvement is more significant but even for quite large bitmaps new method is generally faster.

You can find attached the patch introducing this new method.

Attachments (2)

Optimize-AlphaBlt-function.patch download (4.4 KB) - added by awi 5 years ago.
Optimized AlphaBlt function.
Optimize-AlphaBlt-function_v2.patch download (736 bytes) - added by awi 5 years ago.
Optimized AlphaBlt function v2.

Download all attachments as: .zip

Change History (12)

Changed 5 years ago by awi

Optimized AlphaBlt function.

comment:1 Changed 5 years ago by awi

Unfortunately, in the Release the results are not as clear as in the Debug mode (see below). Apparently, iterations over the bitmap are especially affected by the debug code and that's why current algorithm is always slower in this case.
In the Release mode results depend on the size of the source bitmap, size of the destination bitmap and the type of the destination bitmap (DDB or DIB 32). Generally it looks like:

                                       current     new
 src small  -> dst small DIB 32           +         +/-
 src small  -> dst small DDB              +         +/-
 src small  -> dst medium DIB 32                    +
 src small  -> dst medium DDB                       +
 src medium -> dst medium DIB 32          +
 src medium -> dst medium DDB             +
 src small  -> dst large DIB 32                     +
 src small  -> dst large DDB                        +
 src medium -> dst large DIB 32           +
 src medium -> dst large DDB                        +
 src large  -> dst large DIB 32           +
 src large  -> dst large DDB              +

Apparently, the new algorithm is faster in drawing small RGBA bitmaps regardless of the size of the destination bitmap. It is also faster in drawing medium source bitmaps over large destination DIB's.
The case with drawing small source over a large destination seems to be important because in practice it is about drawing icons.
So maybe both algorithms should be implemented and executed conditionally?

Target bitmap size: 1000 x 800
                                   Drawing time (ms)
source bitmap size:   16 x 16          400 x 400         800 x 800
algorithm:         current   new     current   new    current   new
--------------------------------------------------------------------
 src     dst
 RGB  on DIB 24        0       3       10       18        60     70
 RGB  on DIB 32        0       3       70       68       280    280
 RGB  on DDB           0       4       70       69       270    270

*RGBA on DIB 24        0       3      270      273      1091   1090
*RGBA on DIB 32      130      24      200      510 !     410   2141 !
*RGBA on DDB        1530      27     1361      521      1621   2171 !

Target bitmap size: 500 x 400
                          Drawing time (ms)
source bitmap size:   16 x 16          400 x 400
algorithm:         current   new     current   new
---------------------------------------------------
 src     dst
 RGB  on DIB 24        3       3       15       16
 RGB  on DIB 32        2       3       69       69
 RGB  on DDB           4       3       69       69

*RGBA on DIB 24        3       5      275      274
*RGBA on DIB 32       39      34       99      513 !
*RGBA on DDB          29      33      344      520 !

Target bitmap size: 100 x 50
                  Drawing time (ms)
source bitmap size:   16 x 16
algorithm:         current   new
---------------------------------
 src     dst
 RGB  on DIB 24        5       5
 RGB  on DIB 32        0       0
 RGB  on DDB           5       5

*RGBA on DIB 24        5       5
*RGBA on DIB 32        5      25
*RGBA on DDB          25      25

comment:2 Changed 5 years ago by awi

And yet another option:
If instead of fixing alpha channel for the whole target bitmap there would be fixed only the area really being the subject of drawing with AlphaBlend API then performance of drawing would improve in all cases.
Benchmark in the Release mode looks as follows:

200 iterations, Release

Target bitmap size: 1000 x 800
                                   Drawing time (ms)
source bitmap size:   16 x 16          400 x 400         800 x 800
algorithm:         current   new2    current   new2   current   new2
---------------------------------------------------------------------
 src     dst
 RGB  on DIB 24        0       0       10       25        60     70
 RGB  on DIB 32        0       0       70       75       280    280
 RGB  on DDB           0       3       70       75       270    270

 RGBA on DIB 24        0       3      270      270      1091   1092
*RGBA on DIB 32      130       3      200       95       410    350
*RGBA on DDB        1530    1267     1361     1336      1621   1591

Target bitmap size: 500 x 400
                          Drawing time (ms)
source bitmap size:   16 x 16          400 x 400
algorithm:         current   new2    current   new2
----------------------------------------------------
 src     dst
 RGB  on DIB 24        3       3       15       15
 RGB  on DIB 32        2       3       69       70
 RGB  on DDB           4       3       69       70

 RGBA on DIB 24        3       5      275      270
*RGBA on DIB 32       33      34       99       85
*RGBA on DDB         276     260      344      330

Target bitmap size: 100 x 50
                  Drawing time (ms)
source bitmap size:   16 x 16
algorithm:         current   new2
----------------------------------
 src     dst
 RGB  on DIB 24        5       3
 RGB  on DIB 32        0       0
 RGB  on DDB           5       3

 RGBA on DIB 24        5       3
*RGBA on DIB 32        5       3
*RGBA on DDB          25      23

It can be seen that AlphaBlend performance is very good when drawing on 32-bit DIB. Even with further manual fixing the alpha channel the overall performance is better then when drawing on 24-bit DIB without any post-processing. If destination bitmap is a DDB then it can be seen only a slight improvement because much overhead is caused by DDB -> DIB32 conversion.
For small and medium RGBA bitmaps drawn on target DDB bitmaps this optimization doesn't improve much and maybe in this case it could be applied previous algorithm which performs much better for small source bitmaps.

Changed 5 years ago by awi

Optimized AlphaBlt function v2.

comment:3 follow-up: Changed 4 years ago by vadz

  • Owner set to vadz
  • Status changed from new to accepted

Sorry for getting so long to get back to this, the latest (v2) patch is obviously correct and I'll apply it soon. I'm not sure what to do about the first patch though: do you still think it's worth using it in some cases? If so, which ones?

It's also still not totally clear how much do we lose on this "resetting the alpha" step as I can't easily benchmark this myself. It would be great if you could add the code you used for benchmarking to tests/benchmarks/graphics.cpp. I also think that we really should benchmark the difference between drawing on a 32 bpp with alpha (e.g. a single transparent pixel) and without it to see how much difference does this extra step make.

I'm keeping this open for now, but if you think the first patch is not needed any more, please close this. Thanks!

comment:4 Changed 4 years ago by vadz

Unfortunately Trac lost track of several commits, including the one related to this ticket:

d04e25699323d91c859988918284b0324a2297c4/git-wxWidgets

comment:5 in reply to: ↑ 3 Changed 4 years ago by awi

  • Resolution set to fixed
  • Status changed from accepted to closed

Replying to vadz:

It's also still not totally clear how much do we lose on this "resetting the alpha" step as I can't easily benchmark this myself. It would be great if you could add the code you used for benchmarking to tests/benchmarks/graphics.cpp. I also think that we really should benchmark the difference between drawing on a 32 bpp with alpha (e.g. a single transparent pixel) and without it to see how much difference does this extra step make.

OK, I will try to add some alpha-related benchmarks.

comment:6 Changed 4 years ago by Vadim Zeitlin <vadim@…>

In 9bcafd02f42c3559d7b7218976d54e7febf94df4/git-wxWidgets:

Revert "Optimize drawing of small bitmaps with alpha in wxMSW"

This reverts commit d04e25699323d91c859988918284b0324a2297c4 because it
results in crashes due to writing out of bounds of the bitmap: nothing
guarantees that the entire (x, y, dstWidth, dstHeight) rectangle fits in the
destination bitmap and so the code of this commit could merrily overflow it
and did it as could be seen e.g. in the HTML test sample after scrolling
around a little.

See #16766.

comment:7 Changed 4 years ago by vadz

  • Resolution fixed deleted
  • Status changed from closed to reopened

@awi: Sorry, I had to revert this because I don't have time to fix it right now and this change contained a fatal bug resulting in memory corruption. It should be fixed to only reset alpha for the pixels actually inside the bitmap.

comment:8 Changed 4 years ago by awi

Right, partial overlapping was not taken into account in this patch. This is fixed in the recent patch: PR144.

New benchmarks added to graphics.cpp shows in some cases significant increase of the drawing speed (drawing ARGB bitmaps on 0RGB DIB or DDB is several times faster). On my machine, with --bitmaps --memory --dc --width=600 --height=400 /N 50000 parameters, the results are as follows:
Without fix:

Benchmarking default memory DC: 50000 ARGB bitmaps done in 10213ms = 204.26us/bitmap
Benchmarking default memory DC: 50000 RGB bitmaps done in 612ms = 12.24us/bitmap
Benchmarking RGB memory DC: 50000 ARGB bitmaps done in 2445ms = 48.9us/bitmap
Benchmarking RGB memory DC: 50000 RGB bitmaps done in 346ms = 6.92us/bitmap
Benchmarking 0RGB memory DC: 50000 ARGB bitmaps done in 10340ms = 206.8us/bitmap
Benchmarking 0RGB memory DC: 50000 RGB bitmaps done in 547ms = 10.94us/bitmap
Benchmarking ARGB memory DC: 50000 ARGB bitmaps done in 822ms = 16.44us/bitmap
Benchmarking ARGB memory DC: 50000 RGB bitmaps done in 504ms = 10.08us/bitmap

Fixed:

Benchmarking default memory DC: 50000 ARGB bitmaps done in 984ms = 19.68us/bitmap
Benchmarking default memory DC: 50000 RGB bitmaps done in 571ms = 11.42us/bitmap
Benchmarking RGB memory DC: 50000 ARGB bitmaps done in 2429ms = 48.58us/bitmap
Benchmarking RGB memory DC: 50000 RGB bitmaps done in 360ms = 7.2us/bitmap
Benchmarking 0RGB memory DC: 50000 ARGB bitmaps done in 1042ms = 20.84us/bitmap
Benchmarking 0RGB memory DC: 50000 RGB bitmaps done in 526ms = 10.52us/bitmap
Benchmarking ARGB memory DC: 50000 ARGB bitmaps done in 807ms = 16.14us/bitmap
Benchmarking ARGB memory DC: 50000 RGB bitmaps done in 505ms = 10.1us/bitmap

comment:9 Changed 4 years ago by Vadim Zeitlin <vadim@…>

In 7ddb522ec2040eb0fb6472b104a19c727350ef7e/git-wxWidgets:

Extend benchmarks of drawing bitmaps on wxMemoryDC

Extended tests to determine speed of drawing RGB/ARGB bitmaps on target
bitmaps with different colour depths (RGB/0RGB/ARGB/system default).

See #16766.

comment:10 Changed 4 years ago by Vadim Zeitlin <vadim@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In c239160d33996c6f73d51c606cf35b52f7af78c9/git-wxWidgets:

Optimize AlphaBlt() to reset the minimal amount of pixels

Modify the loop fixing alpha channel value in order to increase speed when
ARGB bitmap is drawn on 0RGB DIB bitmap or 32-bit DDB bitmap.

Closes #16766.

Note: See TracTickets for help on using tickets.