Kind of wish people would stop inventing new notation. There's already Microsoft SAL. Just use the existing annotations (like _In_reads_(N) here) instead of forcing people to adopt two of them their codebases! https://learn.microsoft.com/en-us/cpp/code-quality/understan...
C23 did not get around to introducing this kind of forward declaration (and, although GCC has had them for a long time, Clang has not implemented that extension either). I think I remember Gustedt (the chair for C11 and C17) openly advocating against the ages-old practice[1] of putting lengths after pointers, I suspect in part because of this problem, and Meneide (the current one) also puts lengths first in his proposed encoding APIs. In any case, the proposal[2] (ETA: by Martin Uecker, who’s also spoken up both in the linked thread and here) is still under consideration, it just didn’t get into C23.
Oh. With all the talk about national body comments I thought a freeze of some sort was already in effect. (Which did make the “nope“ paper on integer constant expressions[1] from Clang more than a bit surprising.) Thanks for the correction!
The freeze is in effect, but issues can still be addressed with NB comments. There was a NB related to this topic and WG14 asked me to create a new revision of this paper.
> Unfortunately this constraints the argument order
In both C and C++, it drives me absolutely bonkers that declaration order matters so much.
Point of personal preference, but they're the last languages I use with any regularity where you can't just reference a symbol that will be declared later in the file. In this case, the compiler should be able to figure it out without any of that hinting; the scope of potential valid bindings is in the parentheses. It's a very constrained scope!
Requiring things be defined in the right order is a bit annoying, as the computer can surely figure it out... but a lot of us are working with really large codebases that are very slow to compile and alleviating the compiler from having to know the entire file--or in some cases entire modules... madness!!--before being able to understand what any of it means really helps.
If compilation speed is an issue, you're already sunk in C and C++ because the #include evaluation rules leave very little room to optimize the massive redundancy of computing headers over and over (and every header includes all its headers, and etc).
Sure, you could also put a lot of complicated macros into headers. But in practice it is not a problem because this is not how you would do it in C. But in C++ you put the definition of a class into the header. Also templates basically have to exist in headers. So it is normal for C++ code to have a lot of code in headers.
You're basically saying pooling code between a large group might be worthwhile even if it requires you to take on a lot of book-keeping the computer could in principle help with. I don't think I've seen anyone else put it this way before.
Then again, we have so much compute going to LLMs lately, I wonder if we need to revisit our assumptions about how much hardware we have available for compilation. What might a language look like with the assumption we can spread LLM-training levels of hardware among everyone working on a large codebase? Or a compiler that can consult some sort of ML-based cache for reordering decisions? There's a new space of options opening up here.
g is UB which is why one can use it easily for checking. h is fine because a is larger than 7. If it were smaller the call could be diagnosed. If you overwrite the pointer inside the function, then the bounds will be lost. (which is different to pa = ... where the bounds still need to match).
The moment the size isn't trivially available the warnings cease. I was using that example just to keep the code brief.
This is a problem because it's trivially easy to have code along the lines of
void foo(int size, int buffer[size]) {
...
int new_base = some_math;
int new_size = some_other_math;
foo(new_size, buffer+new_base);
...
}
That computes the size or base incorrectly, and the compiler will do nothing to stop you walking off the ends.
Obviously this RFC just uses the existing syntax if it's there and doesn't need explicit annotations, but the existing syntax requires the size parameter first which many APIs don't have. gcc has an extension to pre-declare a parameter name and type but you can't just use macros to make that work in other or older compilers. They also don't support sizes that aren't just a direct reference to a parameter (e.g. you can't make void some_matrix_func(int size, float matrix[size*size])).
As far as I can tell SAL is a static analyzer aid, and only applies to specifically annotated values.
This RFC applies to all pointers in all cases, and bounds checking occurs on every pointer access at runtime. So while the annotations clearly overlap, the language semantics are vastly different.
There are no language semantics for either, they're just declarative annotations about run time behavior. Either compiler can add checks at compile or run time for the annotations. There's really no reason to have two sets of parallel annotations for the same constraint.
GCC and Clang have a standard syntax for attributes, which is what this feature uses. Moreover the position semantics of the __attribute__ syntax matches the positional semantics of the standardized C++ attribute syntax, which is (fingers crossed) finally being picked up in C23.
So with that mea culpa let's address the rest.
__counted_by is obviously not standardized - this is an RFC for clang, let alone a spec proposal.
__counted_by is a macro that expands to:
__attribute__((__counted_by__(T)))
Which is the standard syntax for all attributes in clang and gcc.
So the question is why the position is different from under SAL. SAL considers the bounds of a pointer to be a feature of the declaration, not the type. This RFC consider pointer bounds to be an attribute of the type itself. Because of that, the syntax of the attribute needs to be in the location for type attributes. You might reasonably respond with "a one off match of this other syntax would have been ok", but the SAL syntax also means that you can only specify one set of bounds per declaration. By having bounds be an attribute of the type you can specify the bounds at each level of indirection in a declaration, so you can have something like:
void dump_string_list(int N, const char * __null_terminated * __counted_by(N) thing);
The other part is that this RFC also allows for opt in wide pointers so you can do
void something_else(int N, int * __indexable * __counted_by(N) thing);
Which would presumably not be usable for an existing API, but for internal logic that isn't subject to ABI constraints would allow the adoption of bounds safety without significant source changes. It also highlights the impact on syntactic consistency of having this be an attribute of the type.
Now, given the necessary positional difference, reusing the same token as used for MS's SAL would make these incompatible, so code bases would need to use different macro names for each anyway.
Finally, if it was considered really valuable, then someone could implement the same implicit adoption that occurs when SAL attributes are present, as already occurs if you do something like
void f(int N, int buffer[N])
while retaining the significantly more powerful and flexible attributes provided by this RFC.
After Oracle v. Google, can a project consider a protocol designed by Microsoft safe to use even if they're doing their own implementation of the contract enforcement engine?
Making sure, firstly, that your implementation is clean room and, secondly, that you can prove that in court, is quite a hassle, though, especially if your implementers visited various conferences and standard meetings where they met people working on the feature being reimplemented.
I recall reading that SAL was patent encumbered but I can't for the life of me find the actual patent. I remember it coming up in the llvm forums though..
It's bounded by the size of the object, eg. the struct. You know this because it was either allocated with malloc or on the stack. There's a paper at the link (one of the most referenced on this topic) which explains everything in detail.
I like it and it is on my list to implement as a prototype for GCC.
And WG14 takes security seriously, but I think you overestimate the power WG14 has to simply change things. WG14 is supposed to standardize existing practice, not reinvent the language. So you should complain to compiler vendors. Or contribute to the development of open-source compilers.
I will believe that, when C finally gets at very least vocabulary types for safer strings and arrays, instead of reboots of functions with pointer/count.
Plenty of compiler vendors have extensions for safer C. Microsoft introduced similar annotation mechanism like the one being discussed here when Windows XP SP 2 came to be, in 2001.
Apparently 50 years weren't enough to make it happen.
Contrast this with how WG21 looks into security and code safety, enough papers on the subject, specially after the cybersecurity bills started to come about.
Out-of-bounds accesses are undefined behaviour. A compiler is free, within the bounds of the standard, to do whatever it wants for them, including doing dynamic checks.
C compilers have been doing stuff like that for decades… trivial example: many compilers will insert an implicit return statement if you don’t do that.
Without this code execution would just continue to whatever function is in memory, if there’s one.
It's still conforming C. It's not strictly conforming C, but no compiler extensions are. I'm not aware of any compiler that fully enforces strictly conforming C. It may not even be possible, since a strictly conforming program "shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior"[1], which would require the compiler to reject any program that produces such output even though some cases can only be detected at runtime. And since strictly conforming programs can only use features of the language and standard library defined in the C standard, the compiler can't even insert runtime checks to exit the program if such behavior is encountered (since that would be a language extension).
Implementation extensions have been around for decades. This seems as C as it can get! Also, the compiler has license to replace UB with anything, might as well be helpful.
Existing compilers that know nothing of the new annotations can still compile the code (without the checks obviously) using the macro definitions provided in the article.