The Asm Mistake

When writing C code for x86 use, what is the best way to execute x86 unique instructions such as cpuid, rep movsb, bsf, movntq, etc? Thirty years ago, there were only a couple of choices: inline assembly code or stand-alone assembly language functions. Today, engineers often use stand-alone assembly language functions, in part because most 64-bit compilers do not support inline assembly language. Use of stand-alone assembly language functions has some disadvantages including: not portable across tool vendor, not 32/64 portable, not calling convention portable. For example, say you want to write a bsf function that works for both Microsoft and gnu tools, both 32-bit and 64-bit builds, both cdecl and fast call calling conventions (for 32-bit code). You have to write 6 assembly language functions. In the 1980's, some clever engineers devised a solution that eliminated these limitations: intrinsic functions. At that time, intrinsic functions were available for only a few x86 instructions such as IN, OUT, CLI, STI. During the 1990's, Intel, Microsoft, gnu and others greatly expanded the list of x86 instructions that can be generated by intrinsic functions. Today, thousands of x86 instructions have intrinsic support (see *intrin*.h for details).


Why do C programmers still use stand-alone assembly language functions for executing single x86 instructions when the intrinsic function alternative is superior in every aspect?