Archive for the ‘Uncategorized’ Category

SIGGRAPH 2010 google calendars

Thursday, June 10th, 2010

SIGGRAPH logoBack by popular demand!

To add to your google calendars, cut and paste the blah@group.calendar.google.com address for the calendar into the “Add a friend’s calendar” box in the “Other calendars” list on the left side of the google calendar page.

  • Courses
    e1980aphrhr0ifb1qitoh4hmu0@group.calendar.google.com
    XML iCal HTML
  • Panels
    ee5kcj6jll596j2rpjv90978ng@group.calendar.google.com
    XML iCal HTML
  • Papers (Previews from Ke-Sen Huang)
    39spm5aoav39om1nm8sg0a6n6s@group.calendar.google.com
    XML iCal HTML
  • Talks
    8afrj4lpp7lg027cppibus5rg0@group.calendar.google.com
    XML iCal HTML
  • Special events
    o0129tkgslt07rvl3drcbn9s6s@group.calendar.google.com
    XML iCal HTML
  • Birds of a Feather
    sbj0roal68qi3tu11humgrn8ro@group.calendar.google.com

Any errors or something to be added?  Mail me at c.j.wills@acm.org

Comparing 160-bit hash values with SSE

Sunday, September 13th, 2009

I’m writing this up as it’s one of those things I was trying to do and thought I’d just be able to crib how to do it from the Internet yet a good amount of googleing turned up nothing.  So here it it for anyone else looking for the same thing.

I was working on something at work where we put a whole load of entries into a dictionary (an STL map) using a 160-bit key (SHA-1 hash).  Using STL maps requires you to define an ordering on your keys and I’d knocked up a quick naïve Key class that compared keys by iterating through it in 32-bit chunks testing if those 32 bits were less than or greater than—you only need to continue to the next chunk if they are equal.  Here’s the class:

class Key {
 public:

 unsigned int & operator[] (int i) {
 return _data[i];
 }

 unsigned int operator [] (int i) const {
 return _data[i];
 }

 bool operator < (const Key &) const;

 private:
 static const int size = 5;

 unsigned int _data[size];
};

And here’s the simple comparison operator:

inline bool Key::operator < (const Key &k) const {
 for (int i = 0; i < size; i++)
 {
 if (_data[i] < k._data[i])
 return true;
 else if (_data[i] > k._data[i])
 return false;
 }

 // Equal
 return false;
}

There was a lot of talk about using SSE at the time so I decided that as an academic exercise I’d try to optimise the key comparison using SSE.  After all, loops with comparisons inside are bad, aren’t they?

Deciphering SSE

My only past experience with SIMD instruction sets had been writing a bilinear interpolation function using 3D-NOW back in 2000 – SSE was totally new to me.  First I had to work out how to use it from C++ (this is on linux).

With GCC on linux it seems you have a choice of intrinsics you can use.  GCC has built in vector types which appear to support simple operations and be portable to different architectures; the manual didn’t explain them very well and it was unclear if I’d be able to do what I was trying to do using them.  And then there are SSE specific intrinsics I think defined by Microsoft.  These are in various include files depending on which subset of SSE you want:

  • xmmintrin.h – SSE
  • emmintrin.h – SSE2
  • pmmintrin.h – SSE3
  • tmmintrin.h – SSSE3
  • smmintrin.h – SSE4.1
  • ammintrin.h – SSE4a
  • bmmintrin.h – SSE5

See how the already confusing SSE numbering scheme is further obfuscated?

These define functions like _mm_add_ps(a, b) that will add two vectors of four 32-bit fp values.  If you actually look in the header you’ll see it maps to __builtin_ia32_addps(a, b) which will (I hope) compile down to the ADDPS instruction.  There isn’t always a direct mapping from the intrinsic name to the instruction name.  Microsoft’s MSDN seems to be the goto place for documentation on these and I found the best way to map between intrinsics and instructions was to type them into google and look for the MSDN links.

Armed with this knowledge, and deciding i didn’t need anything beyond SSE2, I set forth.

Figuring out how to do it

I downloaded the AMD SSE docs and tried to figure out how to compare two compare two long values without ending up with a loop again.  And it’s not necessarily obvious when you just have a bit alphabetical list of opcodes.  I followed a few red herring lines of thinking for a bit then realised I could implement the same algorithm but doing the comparisons in parallel, removing the evil loop/compare/branch.

SSE2 has packed equality and greater than instructions for 8, 16 and 32 bit integers.  I decided to avoid the floating point comparisons since my hash values could have bit patterns for illegal fp values.  PCMPGTD seemed the obvious choice since it compares four 32-bit values at once—I decided to ignore the last 32-bits of my 160-bit key until I figured this bit out.  Unfortunately it sets the result by setting each 32-bit chunk to either all 1′s or all 0′s and it wasn’t to clear where to go from there.  However the result from the packed 8-bit comparison PCMPGTB can then be fed into PMOVMSKB instruction to get a nice packed 16-bit array of comparison results.  Thus I can can the packed 8-bit greater than, equal and less than results from 128-but values thusly:

inline bool Key::operator < (const Key &k) const {
 __m128i i1 = _mm_load_si128(reinterpret_cast<const __m128i *>(_data));
 __m128i i2 = _mm_load_si128(reinterpret_cast<const __m128i *>(k._data));

 __m128i gt128 = _mm_cmpgt_epi8(i1, i2);
 __m128i eq128 = _mm_cmpeq_epi8(i1, i2);
 unsigned int gt = _mm_movemask_epi8(gt128);
 unsigned int eq = _mm_movemask_epi8(eq128);
 unsigned int lt = ~(gt | eq) & 0x0000ffff;

The idea of our initial algorithm was that I start by comparing the most significant chunks of the two values and if a > b for that then the rest of the bits don’t matter, likewise is a < b.  Only if the chunks are equal do I have to move on to the next most significant chunk.  Since I now have two binary masks for gt and lt of the chunks I can interpret them as integers and simply say

bool result = gt < lt;

and we’ve compared two 128-bit values with no loops or branches.  And indeed it compiles down to:

movdqa    (%rsi), %xmm0
 movdqa    (%rdi), %xmm1
 movdqa    %xmm1, %xmm2
 pcmpgtb    %xmm0, %xmm2
 pcmpeqb    %xmm1, %xmm0
 pmovmskb    %xmm2, %edx
 pmovmskb    %xmm0, %eax
 orl    %edx, %eax
 notl    %eax
 andl    $65535, %eax
 cmpl    %eax, %edx
 setb    %al
 ret

with gcc.

Two asides

When I started looking at the compiler’s assembly output I was completely confused for at least a couple of hours about how it worked at all.  I’m not too familiar with x86 assembler (I grew up on 68000) and I’d been reading all the Intel/AMD docs that have it as op dst, src and the gcc output looked like it was loading things then overwriting them, using uninitialised values and all sorts of lunacy.  Turns out gcc uses AT&T syntax, which is op src, dst.

Also, the SSE version of the comparison is not equivalent to the original – we compare in 8-bit chunks rather than 32-bit and the order of the comparisons is messed up further by the little-endian storage of the 32-bit values in the array.  But it doesn’t matter as long as the ordering is consistent with itself.

Extending to 160 bits

We have 32 bits left over to compare.  Rather than trying to marshal them into our SSE scheme we treat them as the least significant chunk and only defer to them if the first 128 bits are equal, so the result becomes

bool r = gt < lt || _data[4] < k._data[4];

Our code now has a single conditional branch in it, but it has to be better than the original, no?

Evaluation

I wrote some tests to exercise the STL map – inserting a million random keys and then looking up both keys that we know are and are not in the map.  The results?  The SSE code is 10% slower.

A colleague pointed out that if the hash is any good then the comparison should almost always be decided on the first chunk and thus usually in the original algorithm only one 32-bit comparison happens anyway.  In the SSE code we are loading more data than needed most of the time and have the additional overhead of marshalling it around.

So the conclusion is think about your problem before plunging in thinking “SSE good, loops and branches bad”.  But it was a good SSE learning experience for me.

Siggraph 2009 google calendars

Monday, July 6th, 2009
Update: The calendars are offline.  Trying to clear up my google calendar list I inadvertently deleted them, but Siggraph was over a month ago anyway.
To add to your google calendars, cut and paste the blah@group.calendar.google.com address for the calendar into the “Add a friend’s calendar” box in the “Other calendars” list on the left side of the google calendar page.
  • Courses
    hvujkn0sbt7jhtirbg948htv18@group.calendar.google.com
  • Panels
    fkufpr69okdkoeghfpon1gf3ts@group.calendar.google.com
  • Papers (Previews from Ke-Sen Huang)
    bjem1niro4scd9f983ithrc0j8@group.calendar.google.com
  • Talks
    l802o82pfvrpdqeq51611i8e98@group.calendar.google.com
  • Special events
    u68gj6c9b7cqumpt4vqu4u2560@group.calendar.google.com
  • Parties!
    11aadmb8iiqj93ka0guk111lag@group.calendar.google.com

If you notice anything wrong or missing – and especially if you have leads on parties, which seem rather thin on the ground so far – email me at c.j.wills@acm.org

Projection mapping Arthur’s Seat

Sunday, June 21st, 2009

Here’s something fun I found while clearing out the hard drives on my old pc—my undergraduate graphics project.  Basically it involved projection mapping photographs of Arthur’s Seat (a big hill in Edinburgh) onto a model created from map contours.  Here’s a couple of videos:

Looks like shit, huh?  But it was a big deal for an undergraduate in 1997 ;)

There were two parts to it; generating a mesh from contour data and then texturing it using photographs.

Building the mesh

The starting point for the mesh was digitised contour data from the UK’s Ordnance Survey (the national mapping agency).  As far as I know the data is literally the contour lines digitised from paper maps so they aren’t as clean as one might like; for example there are gaps where text appears on the map.  And it’s all encoded into a fun text format called NTF.

After writing a NTF parser (which I think I did in perl) my first attempt to get useful data was to try and create complete contours.  I found a couple of papers on the internet that weren’t very useful, then tried taking the end point of each contour segment and trying to connect it to the nearest endpoint of another contour at the same elevation.  This produced fairly crappy results (I don’t remember why, but I still ended up with a lot of gaps.)

Time for a different approach. This time I took each digitised contour point and just treated it as an isolated point—a point cloud.  Then then assuming that contours don’t cross (that the ground never twists over itself) I can project them all onto the ground plane (i.e. set z = 0) and create a mesh in 2d using a delaunay triangulation.  Then bring back the z coordinates and voila, a 3d mesh.

The mesh is nice and detailed, but with more polygons than we really want—I forget how many, tens of thousands?  More than you wanted on 1997 hardware, anyway.  I implemented Michael Garland’s scape algorithm to decimate the mesh to a specified number of points.

I could now texture the mesh with the OS 1:50,000 map of the area and spin it around in real time on an O2—whee!

[Hardware aside: I was mostly writing this on the University's Sun workstations with the mesa software OpenGL library.  I also had access to the multimedia lab, which had some SGI kit; an O2, an Indy and a Onyx with Reality Engine graphics and a whole 4Mb of texture memory!  Around this time I also bought a 3dfx Voodoo Rush card for my PC at home, and getting it to run on that required porting the code to windows.]

Texturing

I set off with a borrowed SLR and my bicycle to take photos to use for texturing.  It started off as a nice sunny day but was raining by the time I got to the back side of the hill so the lighting is horribly variable.  I had the film scanned on to PhotoCD up to a resolution of about 3k—you can see the set here.

Calibrating the photos was a two stage process.  First marking on the map where I took the photos from and working out those co-ordinates, and then figuring out the camera rotation and field of view (I ignored lens distortion).  For this I wrote a little GL app that displayed the photo and the mesh and allowed the rotation and fov to be varied until the horizons matched.  This was kind of tedious, so I wrote a little routine to work out the error between the two horizons and threw a numerical optimiser (from Numerical Recipes) at it.

Now I could project the textures onto the model.  I had a lot of overlap in coverage, and the original idea had been to use view dependent texturing (an idea borrowed from the Facade paper) where each pixel is textured from the image where the projection is closest to the surface normal.  Fancy shading like this required ditching OpenGL and going to renderman (BMRT actually) .

The view dependent texturing looked pretty bad, mainly because of the lighting differences between the photos.  I attempted to balance the photos in PaintShop Pro, but it still looked poor.  In the end the shader just ended up blending the photos.

If I find more stuff (like more images) I’ll update this post.

My first plugin

Friday, June 5th, 2009

My first vfx job was with Smoke & Mirrors in London—I use the word ‘job’ in the loosest sense, as they were actually part-sponsoring my doctorate and part of the deal was that they had to employ me for three months over the three years (it was actually more complicated than that, involving a subsidiary that went bust, but that’s a story for another day).

S&M was your typical Soho post-production boutique; five flame suites and a bunch of crazy people rushing around.  Me, fresh-faced post graduate who’d seen A Bug’s Life and thought he’d like to work in computer graphics, is dumped in front of the engineering flame (i.e. unlicensed), handed the thousand page manual and asked to write a rack defocus plugin so that they didn’t have to buy another copy of a commercial one.

The flame API is actually very simple; you effectively get a buffer of pixels in and some parameters then have to produce a buffer of pixels out.  The hard parts were finding my way around the flame interface and working out what a rack defocus is.

I knew enough about optics and image processing to know that there’s not enough information in the 2d image to do a defocus properly.  So I spent a good while looking at what the commercial plugin did and googling stuff until I hit the in-retrospect obvious key—convolve the image with a kernel the shape of the lens aperture you want to see, and boost the highlights so that they don’t go flat and disappear when they are blurred.

So to get those nice hexagonal, octagonal and circular bokeh effects that people love so much I just needed to write a routine to draw a polygon with the appropriate number of sides into the kernel and then convolve it with the image (after first boosting any pixels above a certain threshold).  For the convolution I converted the image to floating point and used the FFTW library to do a fourier convolution, and believe me on an old SGI Octane it was slow.

Sure it’s simple and everyone knows how to do it, but it was a big deal to me at the time.  And here’s a little animation I found on my old pc, a still photo of Hong Kong with a foreground matte and animated defocus.

Siggraph sketch that never happened

Monday, June 1st, 2009

Back in 2003, I submitted a sketch to Siggraph on my thesis work.  I had been working at Framestore for a few months, was waiting for the University to organise my viva and figured that if I got something accepted I might have a fighting chance of getting someone else to pay my airfare to San Diego.

Also our group never pushed people very hard to publish so all I had out there was a couple of work in progress papers, and since I wasn’t intending to pursue this work further it seemed a shame that there wasn’t anything out there describing my work in its final form (other than the dusty copies of the thesis sitting on my and the University’s bookshelves).

So I sent it off into the electronic submission system, got an id number, and never heard anything again—I assume it was rejected but who knows, maybe it was just lost in the system.

But for those who care, here it is.