Yes, you're right. It's turtles until RAM locking mechanisms.
RAM does the locking inside it, and expose it as "compare-and-set" and "compare-and-swap" etc primitives. Various computing languages that use those primitives usually call that "atomic data structures". The thing is that RAM is way faster in that regard than locking on user/kernel level, so for program it looks like there's no locks. But atomics indeed do slow down your program, if just a little, because "compare-and-set" is still slower than just "set".
The RAM does not know anything about locking, it's the cache coherence protocol. The CPU will request a cache line in exclusive state, it does the operation on the memory and ensures that during it has always been exclusive to that core. After all the other cores will observe the change (and depending on the memory model+ordering, the operations that have/will happen before and after).
RAM does the locking inside it, and expose it as "compare-and-set" and "compare-and-swap" etc primitives. Various computing languages that use those primitives usually call that "atomic data structures". The thing is that RAM is way faster in that regard than locking on user/kernel level, so for program it looks like there's no locks. But atomics indeed do slow down your program, if just a little, because "compare-and-set" is still slower than just "set".