Just submitted a post at stackoverflow which would also have a nice place here, so.. here we go :-)

It was concerning the topic how to convert ARGB to YUV using the GPU. Some time ago I’ve developed and used the following OpenCL kernel to convert ARGB (typical windows bitmap pixel layout) to the y-plane (full sized), u/v-half-plane (quarter sized) memory layout as input for libx264 encoding.

__kernel void ARGB2YUV (
                        __global unsigned int * sourceImage,
                        __global unsigned int * destImage,
                        unsigned int srcHeight,
                        unsigned int srcWidth,
                        unsigned int yuvStride // must be srcWidth/4 since we pack 4 pixels into 1 Y-unit (with 4 y-pixels)
                    )
{
    int i;
    unsigned int RGBs [ 4 ];
    unsigned int posSrc, RGB, Value4 = 0, Value, yuvStrideHalf, srcHeightHalf, yPlaneOffset, posOffset;
    unsigned char red, green, blue;

    unsigned int posX = get_global_id(0);
    unsigned int posY = get_global_id(1);

    if ( posX < yuvStride ) { 
        // Y plane - pack 4 y's within each work item
        if ( posY >= srcHeight )
            return;

        posSrc = (posY * srcWidth) + (posX * 4);

        RGBs [ 0 ] = sourceImage [ posSrc ];
        RGBs [ 1 ] = sourceImage [ posSrc + 1 ];
        RGBs [ 2 ] = sourceImage [ posSrc + 2 ];
        RGBs [ 3 ] = sourceImage [ posSrc + 3 ];

        for ( i=0; i<4; i++ ) { 
            RGB = RGBs [ i ]; blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;

            Value = ( ( 66 * red + 129 * green + 25 * blue ) >> 8 ) + 16;
            Value4 |= (Value << (i * 8)); 
        } 
        destImage [ (posY * yuvStride) + posX ] = Value4; 
        return; 
    } 
    posX -= yuvStride; yuvStrideHalf = yuvStride >> 1; 

    // U plane - pack 4 u's within each work item 
    if ( posX <= yuvStrideHalf )
        return;

    srcHeightHalf = srcHeight >> 1;
    if ( posY < srcHeightHalf ) {
        posSrc = ((posY * 2) * srcWidth) + (posX * 8);

        RGBs [ 0 ] = sourceImage [ posSrc ];
        RGBs [ 1 ] = sourceImage [ posSrc + 2 ];
        RGBs [ 2 ] = sourceImage [ posSrc + 4 ];
        RGBs [ 3 ] = sourceImage [ posSrc + 6 ];

        for ( i=0; i<4; i++ ) { 
            RGB = RGBs [ i ]; 
            blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;

            Value = ( ( -38 * red + -74 * green + 112 * blue ) >> 8 ) + 128;
            Value4 |= (Value << (i * 8));
        } 
        yPlaneOffset = yuvStride * srcHeight; 
        posOffset = (posY * yuvStrideHalf) + posX; 
        destImage [ yPlaneOffset + posOffset ] = Value4; 
        return; 
    } 
    posY -= srcHeightHalf; 
    if ( posY >= srcHeightHalf )
        return;

    // V plane - pack 4 v's within each work item
    posSrc = ((posY * 2) * srcWidth) + (posX * 8);

    RGBs [ 0 ] = sourceImage [ posSrc ];
    RGBs [ 1 ] = sourceImage [ posSrc + 2 ];
    RGBs [ 2 ] = sourceImage [ posSrc + 4 ];
    RGBs [ 3 ] = sourceImage [ posSrc + 6 ];

    for ( i=0; i<4; i++ ) { 
        RGB = RGBs [ i ]; 
        blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;

        Value = ( ( 112 * red + -94 * green + -18 * blue ) >> 8 ) + 128;
        Value4 |= (Value << (i * 8));
    }

    yPlaneOffset = yuvStride * srcHeight;
    posOffset = (posY * yuvStrideHalf) + posX;

    destImage [ yPlaneOffset + (yPlaneOffset >> 2) + posOffset ] = Value4;
    return;
}

This code performs only global 32-bit memory access while 8-bit processing happens within each work item.

Oh.. and the proper code to invoke the kernel

unsigned int width = 1024;
unsigned int height = 768;

unsigned int frameSize = width * height;
const unsigned int argbSize = frameSize * 4; // ARGB pixels

const unsigned int yuvSize = frameSize + (frameSize >> 1); // Y,U,V planes

const unsigned int yuvStride = width >> 2; // since we pack 4 RGBs into "one" YYYY

// Allocates ARGB buffer
ocl_rgb_buffer = clCreateBuffer ( context, CL_MEM_READ_WRITE, argbSize, 0, &error );
// ... error handling ...

ocl_yuv_buffer = clCreateBuffer ( context, CL_MEM_READ_WRITE, yuvSize, 0, &error );
// ... error handling ...

error = clSetKernelArg ( kernel, 0, sizeof(cl_mem), &ocl_rgb_buffer );
error |= clSetKernelArg ( kernel, 1, sizeof(cl_mem), &ocl_yuv_buffer );

error |= clSetKernelArg ( kernel, 2, sizeof(unsigned int), &height);
error |= clSetKernelArg ( kernel, 3, sizeof(unsigned int), &width);

error |= clSetKernelArg ( kernel, 4, sizeof(unsigned int), &yuvStride);
// ... error handling ...

const size_t local_ws[] = { 16, 32 };
const size_t global_ws[] = { yuvStride + (yuvStride>>1), height };

error = clEnqueueNDRangeKernel ( queue, kernel, 2, NULL, global_ws, local_ws, 0, NULL, NULL );
// ... error handling ...

Note: have a look at the work item calculations. Some additional code needs to be added (e.g. using mod so as to add sufficient spare items) to make sure that work item sizes fit to local work sizes.

Raulka (a fellow user from xda-developers) has released an updated Floyd 2.4 theme suitable for VGA/WVGA/QVGA/WQVGA/Portrait/Landscape.

Features:

  • VGA sized beautiful graphics
  • 7 types of clocks (change with Floyd’s toggle button)
  • 4 types of icons (change with Floyd’s toggle button)
  • 4 types of buttons (change with Floyd’s toggle button)
  • Almost every icons/labels have ontap and/or onhold function
  • v2.4 – Only two Cab released: QVGA and VGA which are contains all of previous versions in portrait and landscape too! Small graphics changes in portrait versions too. Important! After you changed the screen orientation from portrait to landscape or landscape to portrait, first of all just click on CPU usage (%) button and wait 7-8 secs. This will set the correct xml and notestoday + todaycalendar settings to the new orientation!

Head over to xda-developers.com and check out Raulkas Floyd theme.

updated tinybatch.exe to support sending of window messages

Format: PostMessage WindowClass WindowTitle WM_MESSAGE wParam lParam
e.g.
# Set Vibra Mode
PostMessage HHTaskBar NULL 1156 2 0

# Set Normal Mode
PostMessage HHTaskBar NULL 1156 0 0

# Set Silent Mode
PostMessage HHTaskBar NULL 1156 3 0

BatteryStatus Advanced is nominated for the “8th Annual Best Software Award 2008” in the category “Today Information plug-in” for touchscreen phone and non-phone devices and that’s great :)
BatteryStatus Advanced has been renamed to HomeScreen PlusPlus UI and is a software developed by me.

The development of BatteryStatus/HomeScreen++ started in 2006 as a simple Today-Plugin showing some indicators and doing some useful stuff for the XDA Neo. Since then, it has advanced a lot by help of fellow users over there at forum.xda-developers.com. It has evolved to a fully customizeable plugin that can be used for any PocketPC-device with a touchscreen and running Windows Mobile 5.0 and higher, e.g. WM6 / WM6.1. While the using of the plugin is kept as simple as possible, a lot of the generic concepts employed allows for creating rich-functional themes. Moreover, the themes are adjusted automatically to fit the device’s screen resolution, that is it scales up QVGA-themes to show correctly on VGA-devices and vice versa.

see my previous post Previous post on 8th Annual Best Software Awards 2008

The Smartphone and Pocket PC magazine published by Thaddeus Computing from Fairfield is one of the most prominent print- and online-magazine for PocketPC’s and touchscreen devices (phone and non-phone) and they (together with prominent peoples as Board of Experts) yearly nominate new, popular and innovative software for such devices with the Best Software Award.

BatteryStatus Advanced is a 100% C/C++ coded application with some less asm-object-mods, therefore it runs very fast and without the need for additional frameworks….

With “HomeScreen ++ UI Edition” you can build your own User Interface by means of xml and make use of external scripting or application. You can create an application-launcher or place a digital-clock with seconds or show the date and other indicators, such as cpu-utilization, memory usage of sd-card, storage, ram, or flashcards. Furthermore you can integrate other today-plugins into the theme, which are loaded as with the default-todayscreen.

With the current release-candidate you can use fluidly view-switching-animations, such as slide in, slide out, blend in, blend out, page flip, or cube. The former ones perform very quick and fluidly on QVGA-resolution as well as on VGA-resolution.

see the following videos made by fellow users over there at www.actualitemobile.com, www.actualitemobile.com. (btw the videos are slow in terms of framerate, but the animations are more fluidly in reality since they are trimmed to human-eyes-fps):

<a href="http://img532.imageshack.us/flvplayer.swf?f=Tbureauskinrj2&amp;autostart=true" target="_blank">http://img532.imageshack.us/flvplayer.swf?f=Tbureauskinrj2&amp;autostart=true</a>

<a href="http://www.youtube.com/v/GuP0jAHex9o" target="_blank">http://www.youtube.com/v/GuP0jAHex9o</a>

So what comes next.. that’s a good question.. there are interesting feature requests here at the discussion-place and you can still make feature requests, but i can’t say for sure wich one of them gets implemented.
HomeScreen++UI version 1.06 is now in “Release Candidate”-state and will be released as finished release soon.
Features planned, that is they are very probably to get implemented, for the version 1.07 are dll-support (allowing programmers to supply dlls for placing custom-functions into HomeScreen++) and advanced landscape/portrait-support. There are more features planned though ;)

Resources: