As Intel continues its march towards performance relevancy in the graphics space with Haswell, it should come as no surprise that we're hearing more GPU related announcements from the company. At this year's Game Developer Conference, Intel introduced two new DirectX extensions that will be supported on Haswell and its integrated GPU. The first extension is called PixelSync (yep, Intel branding appears alive and well even on the GPU side of the company - one of these days the HD moniker will be dropped and Core will get a brother). PixelSync is Intel's hardware/software solution to enable Order Independent Transparency (OIT) in games. The premise behind OIT is the quick sorting of transparent elements so they are rendered in the right order, enabling some pretty neat effects in games if used properly. Without OIT, game designers are limited in what sort of scenes they can craft. It's a tough problem to solve, but one that has big impacts on game design.

Although OIT is commonly associated with DirectX 11, it's not a feature of the API but rather something that's enabled using the API. Intel's claim is that current implementations of OIT require unbounded amounts of memory and memory bandwidth (bandwidth requirements can scale quadratically with shader complexity). Given that Haswell (and other integrated graphics solutions) will be more limited on memory and memory bandwidth than the highest end discrete GPUs, it makes sense that Intel is motivated to find a smaller footprint and more bandwidth efficient way to implement OIT.

The hardware side of PixelSync is simply the enabling of programmable blend operations on Haswell. On PC GPU architectures, all frame buffer operations flow through fixed function hardware with limited flexibility. Interestingly enough, this is one area where the mobile GPUs have moved ahead of the desktop world - NVIDIA's Tegra GPUs reuse programmable pixel shader ALUs for frame buffer ops. The Haswell implementation isn't so severe. There are still fixed function ROPs, but the Haswell GPU core now includes hardware that locks and forces the serialization of memory accesses when triggered by the PixelSync extension. With 3D rendering being an embarrassingly parallel problem, having many shader instructions working on overlapping pixel areas can create issues when running things like OIT algorithms. What PixelSync does is allows the software to tell the hardware that for a particular segment of code, that any shaders running on the same pixel(s) need to be serialized rather than run in parallel. The serialization is limited to directly overlapping pixels, so performance should remain untouched for the rest of the code. This seemingly simple change goes a long way to enabling techniques like OIT, as well as giving developers the option of creating their own frame buffer operations.

The software side of PixelSync is an evolved version of Intel's own Order Independent Transparency algorithm that leverages high quality compression to reduce memory footprint and deliver predictable performance. Intel has talked a bit about an earlier version of this algorithm here for those interested.

Intel claims that two developers have already announced support for PixelSync, with Codemasters producer Clive Moody (GRID 2) appearing in the Intel press release excited about the new extension. Creative Assembly also made an appearance in the PR, claiming the extensions will be used in Total War: Rome II.

The second extension, InstantAccess is simply Intel's implementation of zero copy. Although Intel's processor graphics have supported unified memory for a while (CPU + GPU share the same physical memory), the two processors don't get direct access to each others memory space. Instead, if the GPU needs to work on something the CPU has in memory, it needs to make its own copy of it first. The copy process is time consuming and wasteful. As we march towards true heterogeneous computing, we need ways of allowing both processors to work on the same data in memory. With InstantAccess, Intel's graphics driver can deliver a pointer to a location in GPU memory that the CPU can then access directly. The CPU can work on that GPU address without a copy and then release it back to the GPU. AMD introduced support for something similar back with Llano.

Source: Intel PR

Comments Locked

9 Comments

View All Comments

  • Jafar - Thursday, March 28, 2013 - link

    I could be wrong, but AMD's zero copy will only see light this year, although it has been announced for a while.
  • TeXWiller - Thursday, March 28, 2013 - link

    Here you go: http://amddevcentral.com/afds//pages/OLD/video.asp...
  • Jafar - Thursday, March 28, 2013 - link

    Thanks TeX!

    Part of the confusion I had is probably due to vague memory from AFDS12, where AMD announced their HSA initiative, and the unified memory architecture. Zero Copy might mean different things (http://amddevcentral.com/afds/assets/presentations... ). I guess what AMD is trying to deliver this year and the next is CPU/GPU with "equal" rights to memory access. I believe current APUs still set aside part of the memory to be exclusive to the GPU.
  • iwodo - Thursday, March 28, 2013 - link

    Cant wait to see how haswell GT3 with cache to perform. I am hoping for a surprise.
  • A5 - Thursday, March 28, 2013 - link

    Well, it's probably a big threat to the GT 640Ms of the world (which is great for an iGPU, mind you), but don't expect desktop-gaming level performance.
  • Shadowmaster625 - Thursday, March 28, 2013 - link

    ack! That van looks like it is photoshopped into that image, it looks like it is floating in the air above the fence rather than on the ground. What is this, a game from 1997?
  • Mr Perfect - Thursday, March 28, 2013 - link

    Those who live in glass houses...
  • yefi - Saturday, March 30, 2013 - link

    See everything?
  • xenol - Wednesday, April 3, 2013 - link

    Is it bad that I'm finding it amusing that zero copy is just being implemented on higher performance machines when it's been a staple in embedded programming for years?

    The framework I use handles events through a zero copy mechanism so as to avoid unnecessary memory consumption.

Log in

Don't have an account? Sign up now