How often does branchless programming actually matter?

Ethan@programming.dev · 2 years ago

How often does branchless programming actually matter?

marcos@lemmy.world · 2 years ago

If you want your code to run on the GPU, the complete viability of your code depend on it. But if you just want to run it on the CPU, it is only one of the many micro-optimization techniques you can do to take a few nanoseconds from an inner loop.

The thing to keep in mind is that there is no such thing as “average developer”. Computing is way too diverse for it.

LaggyKar@programming.dev · 2 years ago

And the branchless version may end up being slower on the CPU, because the compiler does a better job optimizing the branching version.

Ethan@programming.dev · edit-2 2 years ago

If you want your code to run on the GPU, the complete viability of your code depend on it.

Because of the performance improvements from vectorization, and the fact that GPUs are particularly well suited to that? Or are GPUs particularly bad at branches.

it is only one of the many micro-optimization techniques you can do to take a few nanoseconds from an inner loop.

How often do a few nanoseconds in the inner loop matter?

The thing to keep in mind is that there is no such thing as “average developer”. Computing is way too diverse for it.

Looking at all the software out there, the vast majority of it is games, apps, and websites. Applications where performance is critical, such as control systems, operating systems, databases, numerical analysis, etc, are relatively rare compared to apps/etc. So statistically speaking the majority of developers must be working on the latter (which is what I mean by an “average developer”). In my experience working on apps there are exceedingly few times where micro-optimizations matter (as in things like assembly and/or branchless programming as opposed to macro-optimizations such as avoiding unnecessary looping/nesting/etc).

Edit: I can imagine it might matter a lot more for games, such as in shaders or physics calculations. I’ve never worked on a game so my knowledge of that kind of work is rather lacking.

LaggyKar@programming.dev · edit-2 2 years ago

Or are GPUs particularly bad at branches.

Yes. GPUs don’t have per-core branching, they have dozens of cores running the same instructions. So if some cores should run the if branch and some run the else branch, all cores in the group will execute both branches, and mask out the one they shouldn’t have run. I also think they they don’t have the advanced branch prediction CPUs have.

https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads

Ethan@programming.dev · 2 years ago

Makes sense. The most programming I’ve ever done for a GPU was a few simple shaders for a toy project.

ishanpage@programming.dev · 2 years ago

How often do a few nanoseconds in the inner loop matter?

It doesn’t matter until you need it. And when you need it, it’s the difference between life and death

0x0@programming.dev · 2 years ago

How often do a few nanoseconds in the inner loop matter?

Fintech. Stock exchanges will go to extreme lengths to appease their wolves of Wallstreet.

graphicsguy@programming.dev · 2 years ago

Also if you branch on a GPU, the compiler has to reserve enough registers to walk through both branches (handwavey), which means lower occupancy.

Often you have no choice, or removing the branch leaves you with just as much code so it’s irrelevant. But sometimes it matters. If you know that a particular draw call will always use one side of the branch but not the other, a typical optimization is to compile a separate version of the shader that removes the unused branch and saves on registers

Spzi@lemm.ee · 2 years ago

The better of those articles and videos also emphasize you should test and measure, before and after you “improved” your code.

I’m afraid there is no standard, average solution. You trying to optimize your code might very well cause it to run slower.

So unless you have good reasons (good as in ‘proof’) to do otherwise, I’d recommend to aim for readable, maintainable code. Which is often not optimized code.

Ethan@programming.dev · 2 years ago

One of the reasons I love Go is that it makes it very easy to collect profiles and locate hot spots.

The part that seems weird to me is that these articles are presented as if it’s a tool that all developers should have in their tool belt, but in 10 years of professional development I have never been in a situation where that kind of optimization would be applicable. Most optimizations I’ve done come down to: I wrote it quickly and ‘lazy’ the first time, but it turned out to be a hot spot, so now I need to put in the time to write it better. And most of the remaining cases are solved by avoiding doing work more than once. I can’t recall a single time when a micro-optimization would have helped, except in college when I was working with microcontrollers.

Oliver Lowe@lemmy.sdf.org · 2 years ago

Given the variety of software in existence I think it’s hard to say that something is so universally essential. Do people writing Wordpress plugins need to know about branch prediction? What about people maintaining that old .NET 3.5 application keeping the business running? VisualBasic macros?

I agree it’s weird. Probably more about getting clicks/views.

tvbusy@lemmy.dbzer0.com · 2 years ago

Please please please, God, Allah, Buddha, any god or non god out there, please don’t let any engineer bringing up branchless programming for a AWS lambda function in our one-function-per-micro-service f*ckitechture.

philm@programming.dev · 2 years ago

Exactly, this sounds like a good way to optimize prematurely…

Eufalconimorph@discuss.tchncs.de · edit-2 2 years ago

Necessary for cryptographic code, where data-dependent branches can create a side-channel which leaks the data.

Ethan@programming.dev · 2 years ago

I thought it might be helpful for optimizing cryptographic code, but it hadn’t occurred to me that it would prevent side channel leaks

FriendOfFalcons@kbin.social · 2 years ago

I only know of a handful of cases where branchless programming is actually being used. And those are really niche ones.

So no. The average programmer really doesn’t need to use it, probably ever.

tatterdemalion@programming.dev · 2 years ago

If you haven’t profiled your code and noticed that branch misprediction is a problem, then it doesn’t matter.

philm@programming.dev · 2 years ago

Yeah especially if it isn’t done on the GPU (where branch optimization certainly makes more sense). branch prediction in CPUs is pretty smart these days.

ZILtoid1991@kbin.social · 2 years ago

It’s useful in digital signal processing, but otherwise it just makes your code harder to read.

const int resultBranchless = aVal * switch + bVal * (1 - switch);
//vs
const int resultWithBranching = switch ? aVal : bVal;

Usually compilers will optimize the second one to a cmov or similar instruction, which is as close to fast branching as it can (except cfmov on older x86 CPUs), and is DSP compatible.

AnUnusualRelic@lemmy.world · edit-2 2 years ago

if branchless goto nobranch    
else return

lowleveldata@programming.dev · edit-2 2 years ago

Can’t imagine any practical difference performance wise. Maybe it’s about making the flow easier to understand? I do recall that Sonarqube sometimes complains when you have too much branchings in a single function

Ethan@programming.dev · 2 years ago

If you’re writing data processing code, there are real advantages to avoiding branches, and its especially helpful for SIMD/vectorization such as with AVX instructions or code for a GPU (i.e. shaders). My question is not about whether its helpful - it definitely is in the right circumstances - but about how often those circumstances occur.

lowleveldata@programming.dev · 2 years ago

Ya, and my examination is I don’t think it has practical impacts for day to day tasks. Unless you’re writing AVX instructions day to day but then you already knew the answer.

NotAPenguin@kbin.social · 2 years ago

As a webdev I’ve honestly never even heard of it