Hacker Timesnew | past | comments | ask | show | jobs | submit | MTGandP's commentslogin

Can you elaborate on why -O3 often generates slower code? I've never heard that before.


It is common knowledge among GCC users. In the past using -O3 was rare because it often generated downright broken code. There used to be an official warning about that.

The situation is better nowadays but still, as far as I know, no major Linux distro uses -O3 as the default for binary packages.

-O3 can generate slower code because of the aggressive inlining and loop unrolling enabled. These optimizations are very tricky because of their effect on cache use. Basically all that extra code can push other needed code/data out of the cache, which can cause a noticeable decrease in performance.


I think it's 'common knowledge' which has outlived it's relevance as I can't recall the last time I found -O2 outperforming -O3.

Practically every performance oriented open source program I come across also defaults to -O3 these days, or sometimes -Ofast which also enables -ffast-math.

>-O3 can generate slower code because of the aggressive inlining and loop unrolling enabled

-O3 turns on vectorization and inlining optimizations but I can't recall any loop unrolling options which are turned on at -O3.

-funroll-loops is not turned on at any of the -O (including -O3) levels due to it being one of the hardest to get right without any runtime data as basis (which is why the only option that turns it on is PGO - profile generated optimization).

Note that I'm talking about modern versions of GCC, if you are using GCC 4.21 on OSX then this (-O2 > -O3) may still typically be the case.

>The situation is better nowadays but still, as far as I know, no major Linux distro uses -O3 as the default for binary packages.

I'd say they typically use the upstream optimization settings.


>I think it's 'common knowledge' which has outlived it's relevance as I can't recall the last time I found -O2 outperforming -O3.

I can, was about 4 months ago with GCC 4.8.0.

>practically every performance oriented open source program I come across also defaults to -O3 these days

How large is your sample size there? I have only seen -O3 in the default makefiles of audio/video encoders. Those tend to be a natural fit for -O3. In contrast, here is the current makefile of my favorite "performance oriented" FOSS program:

http://repo.or.cz/w/luajit-2.0.git/blob_plain/HEAD:/src/Make...

CCOPT= -O2 -fomit-frame-pointer # Note: it's no longer recommended to use -O3 with GCC 4.x. # The I-Cache bloat usually outweighs the benefits from aggressive inlining.

>I can't recall any loop unrolling options which are turned on at -O3.

You are right (I just looked it up). Guess my memory failed me there.

>I'd say they typically use the upstream optimization settings

I wish! Packagers love to fool around with the upstream sources and makefiles to make them conform to whatever "standards" they have.


>How large is your sample size there? I have only seen -O3 in the default makefiles of audio/video encoders. Those tend to be a natural fit for -O3

Well I very much implied 'performance-oriented' programs as we where discussing 'performance' generated by compiler options, which indeed are a natural fit for -O3.

For which my 'sample size' would be software like encoders, archivers, emulators, 3d renderers etc.

Obviously there's little point in using -O3 on your text editor (yes, extreme example), basically for any non performance-oriented software -O3 will likely only serve to increase the binary size as any potential gains will be unnoticable.

>I wish! Packagers love to fool around with the upstream sources and makefiles to make them conform to whatever "standards" they have.

Not really my experience with Arch packages, but of course I haven't looked at the PKGBUILDS for even 1% of all available packages, basically only those performance oriented packages on which I rely.


-O3 inlines functions and unrolls loops more aggressively, so the increased code size might not fit in the CPU cache.


Fair enough... let me do a quick test with -O2 and see how that fares


Might want to also give a go at -Os (optimize for small code size). On code that spends its time iterating on the same code over and over again this can be a big win.

[edit] Nope, definitely not better. I get O2 being a slight win over O3 and Os being much worse.


Compiler optimization flags are very code and type specific.

(Note that I am comparing apples to oranges here, I used the C++ code used in Rust experiments found here: https://github.com/huonw/card-trace/blob/master/original.cpp )

I changed the C++ version typedef float f to typedef double f, so using floats instead of doubles, compiling with the following flags:

    -m64 -march=corei7-avx -mtune=corei7-avx -Ofast -funroll-all-loops
and the run time dropped down from 17.5 seconds to 11.2 seconds. If I remove -funroll-all-loops, the run time jumps to 14.2 seconds. The original 17.5 seconds were ran with vanilla code using float and -O3. Interestingly enough, if you use the aforementioned flags with floats instead of doubles, the program executes in 15.01 seconds instead. Using floats is bad for performance! Further, if you remove -funroll-all-loops when using floats, the performance increases, but with doubles it decreases.

So, when optimizing, play with compiler flags. Play with types. Play with whatever you have at your disposal and make no assumptions. This stuff is far more complex than believing that certain flags are better than others, it all depends on everything.


So it totally disables all loop unrolling, inlining... hmm


Does a couple of other things, including choosing instruction sequences that are more compact afaik. But also favouring compactness over alignment and obviously jumps over unrolling. Obviously this isn't code that benefits terribly much from it, but it has been known to happen.


It's nice to see some big names focusing on anti-aging research. It seems obvious that anti-aging research is a good idea, but surprisingly few people are actually pursuing it.

This video (http://www.youtube.com/watch?v=Pm-5s__aZE0) has an interesting debate about whether immortality would be a good thing. I'm still uncertain about this; but I think it's fairly straightforward that our healthspans have not yet reached optimal length. Heck, we currently spend the first 20-30% of our lives just getting the ball rolling.


"The Dennis Ritchie of this generation was just born."


You can use [my keyboard layout optimizer](https://github.com/MTGandP/Typing) to create your own optimized keyboard. I provide a corpus of programming text you can use, and you can change the weightings to increase the significance of the programming text. (I don't remember if I wrote documentation explaining how to do this; if you want to do this but can't figure out how, remind me and I'll write up some documentation.)


The [Kinesis](http://www.kinesis-ergo.com/) Advantage is a cheaper alternative to Maltron that follows the same design pattern. It's the keyboard I use, and it's definitely more comfortable than a typical staggered keyboard, or even a simpler ergonomic keyboard.


Author of The Keyboard Layout Project here. I'm familiar with Workman; it's a reasonably strong attempt, but I think it has a number of problems and fails to perform as well as Colemak or my own layouts.

If you compare Workman and MTGAP 3.14 (shown [here](http://mathematicalmulticore.wordpress.com/the-keyboard-layo...) using [my keyboard evaluator](https://github.com/MTGandP/Typing), MTGAP 3.14 performs better on every single statistic except finger work. (You can see all of the different statistics explained on [my GitHub page](https://github.com/MTGandP/Typing/blob/master/Fitness). I didn't post them all here because it takes up too much space, but if you want, you can pull the program and to the comparison yourself.)


IIRC, Dvorak has about 500,000 users worldwide.


Okay. If that's just food for thought, fine, but if that's an argument for widespread adoption it falls pretty short. In 2011 35% of the world's population were internet users according to a quick wikipedia reference source [1]. That's 2.45 billion people. Taking the 500,000 Dvorak users into account yields 1 Dvorak user out of every 4900 internet users. I realize these are all kind of soft numbers we're throwing around, but I feel my point stands.

[1] FWIW, http://www.itu.int/ITU-D/ict/facts/2011/material/ICTFactsFig...


I'm not trying to design yet another standard. I recommend that people use [Colemak](http://colemak.com/)—even though it's not as popular as Dvorak right now, it's easier to learn, so I think it has a better chance of overtaking QWERTY in the long run.

I design keyboard layouts because it's fun. I want to discover what it means for a layout to be the best. It's more of a scientific curiosity than a desire to make things better—although I do think QWERTY is in need of replacement.

> qwerty might be 1 to 5% less optimal than something else - let's even say 10%.

It's definitely a lot worse than that. If you're just talking about finger travel distance, for example, most modern keyboards (Colemak, [Arensito](http://www.pvv.org/~hakonhal/main.cgi/keyboard), my keyboard) do about three times better.


That's why my keyboard layouts try to optimize for rolls more than any other layout I've seen. The most recent version has a number of very nice rolls, TH, IN, ST, and some other less-frequent digraphs.


Hi, writer of The Keyboard Layout Project here. I'm well aware of the limitations of scoring a layout based on crude heuristics, and I'm trying to collect some good data. Right now I have some data from a few layouts that I've collected using [Amphetype](http://code.google.com/p/amphetype/), which records speed and accuracy for characters, trigraphs, and words. I'm curious, what program did you use to collect typing data?

If you or anyone else has typing data or is willing to collect some, please post a comment on my blog (preferably [here](http://mtgap.wordpress.com/2010/01/16/wanted-typing-data/)) and I will email you.

Ideally, we would have typing data not just from QWERTY and carefully-designed layouts, but randomized layouts. All the layouts people use (except QWERTY) share design patterns—putting the most common keys on the home row, etc. We can get more accurate data if we know how people type on random layouts.

I plan on using a little grant money from my school to pay some people to learn randomized keyboard layouts and then record typing data. If anyone's willing to do this for free, I'd love to have your help; again, you can contact me by leaving a comment on my [blog](https://mathematicalmulticore.wordpress.com/).

Data collection with a web app is also a good idea. I don't plan on writing such a thing any time soon, but if anyone does plan to, I think it would be very useful.


I'll see if I can dig up my old app for collecting the data (it was an OS X app, but would be easy enough to port). It can actually be layout agnostic (although I may have just implemented it with Colemak in mind, since it was for my own use). For the user's sake, it asks you to type out symbols according to your own layout, but the way it stores the data is only interested in key positions.

One confounding factor in collecting data this way, though, is it's actually quite a lot more difficult to copy a random sequence of characters than it is to copy real words, and I don't think that's just because of the key positions. I think it just takes more brain power to process them when you can't leverage your brain's language hardware to divide it into chunks.


Thanks, that could be valuable. Is the data separated by keyboard layout?


I wonder what the ethics review board would think about training someone on a nonstandard keyboard layout, as that would introduce a handicap to their life outside the study.


You can use a new keyboard layout and still remember QWERTY. It's just like learning a second language (only it takes a lot less time).


You can, but if you don't regularly practice, your QWERTY proficiency will deteriorate startlingly quickly. It's easier to pick back up than learning a whole new layout, but still.

It was really unsettling to watch, in only two weeks, my QWERTY proficiency drop from "90 WPM burst speed" to "stare at the keyboard while I type".


Then I'll make sure my test subjects continue to practice QWERTY.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: