VLA usage can also be analyzed (e.g. when bounded in a simple way by a function argumen) and then may allow reduced stack usage compared to fixed-size arrays sized for a worst case.
This does not have much to do with my point, but, anyway, basically any C compiler supports them. MSVC does not, but it also does not support a recent standard so you can not use MSVC to compile C, just some outdated subset.
Pedantically, yes VLAs are a mandatory part of C99 (only). Practically, there has been some resistance so they were reverted to optional in later standards (in C11/17, the whole thing is optional; in C23, variably modified array types are mandatory but the ability to allocate arrays of such types on the stack is not). In any case, MSVC is quite a bad benchmark as far as C (not C++) standards conformance goes—it’s been quite some time since Sutter’s (in)famous post[1] and things have improved, but not to the point that I’d believe C to be a priority for Microsoft.
(Note MSVC has alloca—and Microsoft’s own libraries in the past have done unwise things with it—so the safety argument for the lack of support does not fly.
I’ve thrown all my projects i have written in c for posix systems in the last 15 years against msvc and had only one project which had an issue. Because a dep used VLAs.
Both C and C++ compilers (in fact, they share this part) very aggressively exploited undefined behavior for performance. But I this was certainly not adversarial. Programmers also regularity picked optimizations over safety. I think nowadays the unsafety of C with modern tooling vs the safety of - say - Rust is very much exaggerated.
I does not work fine in C++ when N and M are not compile-time constants, which is basically always the case in any interesting numerical algorithm. Also not in Rust.
It works fine in C though, or FORTRAN, or Ada, or ALGOL 60, ...
Why is that?
I find Ada much nicer than the C-languages when it comes to arrays: A'Range, A'Length, A'First, and A'Last are super-useful, as is the unconstrained array.
You can even use unconstrained arrays to provide the same functionality that Optional does in functional-programming, provided the element-type can be an element of an array:
-- Here we define an index-type with one value.
Subtype Boolean_Index is Boolean range True..True;
-- And here we define an array indexed by it, but can also have length 0.
Type Optional(Boolean_Index range <>) of Element;
And there you have the mechanism for Optional; just use "For Object of Optional_Array Loop" to enclose your operations and bam, it works perfectly.
I guess you aren't their target customer anyway, NVidia isn't that found of pure C code, with first class tooling for C, C++, Fortran, Python JIT, Ada and most recently Rust.
The std:mdspan proposal came from NVidia employees, alongside AMD and HPC research labs.
The thing is that I rewrite high performance numerical code on GPUs and the CUDA part is what sucks most. And the moment one uses templates, the compilation times make it insufferable. I really do not understand why people put up with this garbage. I am really looking forward to the day where I can remove CUDA from my projects and replace it with compiler-supported offloading is really
The kernel removed VLAs, I am more talking about vm types. But even for VLA - while I had a small role in that undertaking myself - I think it was a stupid mistake from a security point of view to remove VLAs from the kernel. Google pays for a lot of nonsense...
I rather would say it works nicely in auto-generating the complex indexing operation for n-dimensional arrays which makes it a lot more convenient and less error-prone to write such code. The compiler may also flatten a loop.
The array of pointer hack used previously to similate 2d arrays using an array to pointers to arrays should not be used outside of special algorithms, as it is error prone and slow.
Maybe, but also irrelevant to the discussion because whether you write mat[b * A + a] by hand or mat[b][a] and let the compiler frontend expand then makes no difference to the optimizer.
It is a struggle though to get the improvements through the committee. Especially the C++ folks from the Clang side fight very hard against it, this is - for example - why we not have forward declarations where I already had weak consensus, but the clang area team made it clear they will never implement it.
The allocator interface is defined via these names. Just supply your own realloc. Of course you need to satisfy the constraints of realloc guaranteed to the compiler, or you need to invoke it in freestanding mode.
Yes, attacker controlled size without limit is bad (and this is also true for heap allocations). For VLAs there is -Wvla-larger-than that can be used to ensure there is a hard limit. To understand the risks of VLAs one also has to compare it to the alternatives. A fixed-size array on the stack is basically always worse. alloca is substantially worse. heap allocation may be a bit better, but also much slower.
What IMHO Rust does not get right and why I do not use it: long compilation times, high complexity, its syntax, polymorphism based on monomorphization, the requirement for many dependencies to get anything done, an ecosystem susceptible to supply-chain attacks, no ISO standard.
I'm sure I could argue with you about the actual technical differences but this part in particular is very, very stupid.
JTC1/SC22 † shouldn't exist at all. A committee structure is a bad way to do this work, and the practice of having periodic meetings - exclusively in person for much of the time these existed - actually makes it less rather than more useful.
ISO mandates a bunch of rules and procedures which surely make some sense if you're agreeing on thread dimensions for oil pipelines but are completely inappropriate for this work and yet because they're ISO committees the WG14 and WG21 processes are captured.
I don't think it makes good sense to use an SDO for this work, but if you must have an SDO for some reason beyond ego then you could do a lot better than ISO. Check out TC39 for example.
† The C and C++ standards committees are respectively Working Groups 14 and 21 of the Sub-committee on Programming Languages, SC22, of the (First and only) Joint Technical Committee between ISO and the IEC. Yes it's committees all the way down. "This programming language standard could have been an email".
ISO agreed that despite there being an existing, popular, broadly supported and open XML document standard they should define Microsoft's proprietary alternative OOXML as an international "standard". They even held votes repeatedly until the voters gave the "correct" answer... no worry about "industry donations" there.
Out of curiosity, what do you think is wrong with monomorphization-based polymorphism? The other alternatives I'm aware of are 1. type-erasure via v-table based dynamic dispatch (which Rust also has in the form of the `dyn` keyword), which has performance and memory-allocation overhead and 2. macros, which Rust also has and, if used for polymorphism, would essentially be like compile-time monomorphization, but clunkier.
Maybe I'm missing something though and there are other alternatives done differently in other languages?
The correct choice IMHO is type-erasure. It does not necessarily have overhead, because optimizers can specialize or devirtualize. Of course, this my depend on how you implement your language, but in C this works nicely. The problem with monomorphization is that it leads to exponential expansion, which later passes of the compiler can not unify again (at least this is much harder than not expanding in the first place). It should also fundamentally limit what you can do, because expansion has to stop at some point, but I haven't thought about this too much.
I also think that where you want monomophization, macro seem fine. I do not think this necessarily has to be clunky, but this is just a guess.
Type-erasure does have an inherent overhead. Sure, optimizations can be made, but they can be fickle and specialization is basically implicit monomorphization.
Using C macros to replicate Rust's monomorphism has several drawbacks: they are inherently unhygienic, even in comparison to Rust's own; you can't set type-bounds; they aren't even a part of C proper, etc.
I prefer Rust's approach with the choice between generics, macros, dyn and Any.
What inherent overhead does type-erasure have? The optimizer can always specialize just like for monomorphization. The difference is that it does not have to do this. Monomorphization specializes everything before the optimizer even has a chance to look the the code. So it fundamentally can not have advantages, only restrictions.
C macros are certainly a proper part of C and one can also certainly add type-bounds. But yes, they are not ideal. Still, if one wanted to do this, one could certainly improve them a lot for type-generic programming. I would prefer this to having macros, generics, a const expression sublanguage, and vtables.
The inherent overhead of type-erasure is the vtable. Which is funny, because you said you prefer one over the other at the end, when they are basically synonymous. The compiler may choose to devirtualize and specialize, but that isn't guaranteed. If you want this behavior, &dyn does the exact same thing.
Specialization provides more opportunities for optimizations, whereas proper type-erasure (which bars specialization optimizations) doesn't due to lack of type information.
"C macros" are a part of the preprocessor, which runs before the actual C compiler. As such, it lacks all semantical information the C compiler would have at that point, such as function implementions. In practice, macros in C serve two purposes: manipulating C source code (which Rust macros can also do, but with more hygiene); specialization polymorphism, but worse (in which both Rust's generics and C++ templates do better).
The vtable can be removed in all cases where you specialize by monomorphization, so the overhead compared to it is not really inherent.
Type-erasure can be undone by specialization. In contrast, monomorphization specializes the code first, at which point it becomes much more difficult to unify it again.
Sorry, my comment about preference was badly phrased: What I meant is that I prefer to make macros better, than having all the different but partially overlapping techniques in the same language. This is the complexity I am complaining about in both C++ and Rust.
As I stated in my follow up, vtables can only be removed when the type information is still available, which, generally, aligns with specialization, except when dealing with libraries (even with LTO). You say the "vtable can be removed in all cases of monorphization", but that isn't guaranteed and often requires double guessing the compiler to ensure it still knows the type, where as monorphization makes it explicit when the compiler doesn't know the type anymore and that specialization must occur.
The problem with your suggestion about macros is that it misunderstands the reason why different techniques exist. While they may overlap, they have different primary niches: Rust macros provide codegen and inline DSL, Rust generics/C++ templates provide monomorphism, C++ constexpr/Rust const fn evaluate expressions in compile time, etc.
Extending preprocessor macros, aside from requiring integration into the C language proper, would likely create something much more complex and with poor DX. Specially so if it tries to fulfill all niches, as each of them often operate at different abstraction levels, with different ergonomic requirements that are often at odds with each other.
I just noticed you said "The compiler can always specialize". That's indisputably false. That only happens if the compiler has enough information to infer the concrete type at the call point. Conditionals, collections, or anything that may throw doubt about original type info and their function implementations can (and normally will) disable specialization.
Not necessarily. In pure theory, yes. But in pure theory, monomorphized code can be deduplicated too.
Monomorphization often forces types information to flow, whereas type-erasure considers that a bonus. As such, a compiler may accidentally introduce opaque boundaries that hide type information. Even in cases where this is the programmer's fault and the type info is truly lost, in monomorphisation, that becomes an explicit error, while, for devirtualization, since it's just an optimization, the call is silently revirtualized (until someone notices it while profiling/decompiling). One has explicit intent, the other doesn't.
The point is much easier to propagate information (and not accidentally lose it) than to deduplicate after specialiation, so I do not agree that this is only of theoretical relevance.
I agree that monomorphization has explicit rules that force specialization, I do not think this is an advantage at all because it is too rigid and one can introduce explicitness at the language level also for devirtualization where this is needed. In this sense, monomorphization is premature optimization which limits what can be expressed at the language level and makes the job of the optimizer much harder (and practically impossible to undo the damage completely).
You seem to be underestimating how easy it is to lose type information. Creating a array of type-erased is already enough to cause confusion, even if the surrounding code is enough to prove the array only has one type. If we are to assume perfect deductive ability for type-erasure, why not for monomorphism as well?
Monomorphization isn't premature optimizations either. It and type-erasure are semantically different. For instance, only monorphization can enable inlining of data structures (as devirtualizating an array of fat pointers involves dealing with ownership concerns, among other things). You either have to type-erase the array itself or box all the items, none of which are very ergonomic for a system-level programming. Also, in terms of performance, monomorphization generally makes the compiler's job easier overall in comparison to type-erasure (specially in Rust where types also encode lifetime), so I don't see the damage unless you are dealing with an extremely size-constrained environment or dealing with instruction cache misses (which is an indication of poor abstractions).
Finally, as stated elsewhere, it's trivial to implement any type-erasure polymorphism in a monomorphic system (save for very specific exceptions, related to hardware), but the reverse is not true. To do so reliably will require assistance of an external codegen tool, like the C macros, generally recreating monomorphism, but worse, as the compiler doesn't actually know the polymorphic code is related, making deduplication even harder.
I don't see why it would easy to lose type information in circumstances where you can do monomorphization. Note that I do not assume "perfect deductive ability". The practical observation is simply that it is far more effort to deduplicate than to clone.
From the code, we can already tell all elements have the same type, but the compiler is already adding checks. One of the options would be to convert the array of fat pointers into a fat slice. But now you have to enlarge the slice's vtable with all possible item operations, even if we don't use them often. Alternatively, use monomorphism. Or use both.
I see, this is what you meant. GCC can not currently do this. But this also not something that would correspond to a monomorphic version. A polymorphic function over a generic array would be specialized to one member type via monomorphization. This would correspond to this: https://godbolt.org/z/nd333xeTP
Aside from being heavily optimization-dependant, thinking through this, I've realized that type-erasure is a bit of a premature abstraction. It forces programmers to use indirection and pass-by-reference/pointer. It forces functions to be type size and alignment-agnostic. That's why type-erasure can't express monomorphism proper, whereas the reverse is not true.
It is trivial to see that the compiler could do inlining also in this case. Your modification does not "break" inlining in the sense of making it impossible. In fact, just making the vtables "const" is enough and it does it: https://godbolt.org/z/oYv47xx8G Or adding "inline" would also do it. (But this not why your argument is wrong.)
The compiler chooses to not inline in your example. And this is the point: The compiler makes inlining decisions based on some heuristics exactly because inlining is not always an advantage (even if it is in toy examples), otherwise it would do it far more often. Implementing type-erasure gives it the freedom to do this while monomorphization hardcodes it. It takes away this freedom. You can get the same result as monomorphization by forcing certain inlining or function-cloning decisions, but the reverse is much harder: you can not automatically deduplicate the already expanded function. (in theory yes, in practice no)
The argument that for a language with type-erasure the inlining heuristics of this specific C compiler may not be ideal, is rather irrelevant. (but interprocedural value propagation is rather smart in principle)
My issue isn't that the modification made it impossible to inline, more so how trivial it was: I just added a different type. Generally, optimizations are disabled not because the compiler finds them inneficent, but because it lacks enough contextual information/intelligence to safely add it. So similar to what you said, in theory, the compiler can optimize type-erasure just like monomorphism, but, in practice, it lacks info to do so. Either by programmer accident, language/compiler limitations, or both.
Also, monomorphization doesn't actually take the compiler freedom away (on its own) to optimize. Similar to how type-erasure can be optimized to behave like monomorphization through devirtualization, monomorphized calls can be virtualized (which Swift shows). The problem is that it takes away transparency from the programmer. It also changes the semantics of the call behind the scene (type-erasure is strictly pass-by-pointers, monomorphism is not). That wouldn't work well for a low-level language.
But your example did not show that the compiler lacks the contextual information. My argument is exactly that in all cases where one can do monomorphization, this contextual information is trivially available because it simply a small part of what can be done with interprocedural constant-propagation anyhow. I think this is easy to see.
I am not commenting on your Gish Gallop of new arguments.
How did it not lack of contextual info? By adding "const", you provided more info to the compiler, and it decided it was safe to inline, despite no actual behavior change. My point is, despite the possibility, these optimizations are easy to disable by mistake.
Also, there's no gish-gallop. You said the compiler can optimize type-erasure like monomorphization, but not monomorphization like type-erasure, and I said that's not true: the compiler can, in fact, do that. I simply addressed your argument. The rest simply elaborates on why languages why might not do it.
No, you did not address my argument. You need to show that it is impossible (or hard) to do function cloning when using interprocedural constant-propagation in a situation where monomorphism does this. But nothing you said contradicts this and the fact that a C compiler may sometimes decide to inline or not a function is completely irrelevant to this question.
In fact, it was exactly my argument that it is an advantage of type-erasure that the compiler has more freedom. An example where the compiler does not specialize a function just demonstrates this flexibility.
I feel like you are attacking a strawman here. My point isn't that it's impossible to optimize, but such optimizations are fickle and easy to break by programmer error and require adequate language expressivity (although low-level languages generally provide that).
You say that the compiler made the decision to not inline it because it judged it to be the most efficient, yet, by adding the "const", thus adding more info, it did inline it. Why did the heuristics change here, even though the logic remained the same? I argue it's not because the compiler decided it would be more performant to not inline, but that it was acting on the side of caution.
Finally, if giving the compiler more freedom to make choices is your goal, then forcing it to start by either type-erasure or monomorphization is the wrong answer. The compiler should be free to choose type-erasure/monomorphization from the start in such case. However, that introduces limitations to the language. Since the generic function is prohibited from knowing the type size at compile time, it can't allocate space for it in the stack. Long story short, I think you'd be forcing the programmer into an abstraction they might not need nor would necessarily help them, and then hoping the compiler to optimize appropriately.
Personally, I care about expressivity and transparency, thus I think having both natively supported is preferable. But if one must be chosen over the other, monomorphism should come first, as you can build a vtable with it (also, there are many ways to build vtables and dynamically diapatch).
Not sure what numbers you are talking about. If you use qsort from the C library the comparison function will not be inlined, but if you provide your own, this is no problem.
If just "providing my own" would help why wouldn't the stdlib benefit too? You're going to have to spell out what you think can actually work here if you want me to believe there's "no problem".
It would also, but nobody cares enough because qsort is already fast enough for most things, and if you cared it is simply enough to do yourself. Are you doubting that C compilers can devirtualize function calls? Here is a small example that illustrates this. The compiler dervirtualizes all calls than folds the result: https://godbolt.org/z/E6cMMr8vx
I haven’t used rust, but having gotten used to C and C++ at a time, I expect that would happen with rust, too, if I started using it. Because of that, I think this is just a matter of familiarity.
> polymorphism based on monomorphization
Implementation detail (yes, there is only one implementation at the moment, but this means it can be changed without changing the language)
> the requirement for many dependencies to get anything done
Fixable if a party is willing to write or package together a fairly large set of dependencies into a single package.
> an ecosystem susceptible to supply-chain attacks
Fixable if that party is trustworthy. Also, for which languages is this less of a problem? You either have third-party libraries and a potential security problem, or you don’t, and need to write more code.
Some of Rust's problems may be fixable, but they are not being fixed at the moment and with something as complex as Rust, this is unlikely. I do not think monomorphization is an implementation detail, rather than a fundamental language design mistake that is difficult to correct.
Beg to differ. After musing for while, I realized Rust could only have chosen monomorphism for its goals, thus it's not a fundamental design mistake, but a necessary choice. One of the overall philosophies is zero-cost abstractions, and only designing a polymorphic system with monomorphism in mind may provide that, because the alternative requires pointers (different type sizes, vtable, etc.).
While, as discussed previously, devirtualization + inlining is a thing, it is not assured and relatively easy to disable. Writing a program relying on these optimizations would produce something bristle and error-prone, and building a polymorphic system around this would be antithetical to the Rust's ZCO philosophy.
Meanwhile, monomorphization can fit functions around the size of the type arguments and access method implementations directly, removing the need for any indirection in the first place. While indirection-based polymorphism can't reliably reimplement monomorphic polymorphism, the reverse is not true. You can reliably reimplement any indirection-based polymorphism with monomorphism, from fat pointers (in fact, Rust already has trait objects to help with that), class-based hierarchies, dynamic typing, etc.
reply