C++11 引入了标准化的内存模型,但这究竟是什么意思?它将如何影响 C++ 编程?
C++11 introduced a standardized memory model, but what exactly does that mean? And how is it going to affect C++ programming?
本文(作者:Gavin克拉克引用Herb Sutter) 说,
This article (by Gavin Clarke who quotes Herb Sutter) says that,
内存模型意味着C++代码现在有一个标准化的库可以调用不管是谁做的编译器以及它在什么平台上运行.有一个标准的方法来控制如何不同的线程与处理器的内存.
The memory model means that C++ code now has a standardized library to call regardless of who made the compiler and on what platform it's running. There's a standard way to control how different threads talk to the processor's memory.
"当你在谈论分裂时[代码] 跨不同内核在标准中,我们正在谈论内存模型.我们准备去优化它而不破坏遵循人们会去的假设编写代码,"Sutter 说.
"When you are talking about splitting [code] across different cores that's in the standard, we are talking about the memory model. We are going to optimize it without breaking the following assumptions people are going to make in the code," Sutter said.
嗯,我可以记住这个和在线可用的类似段落(因为我从出生起就有自己的记忆模型:P),甚至可以发布作为其他人提出的问题的答案,但是老实说,我并不完全理解这一点.
Well, I can memorize this and similar paragraphs available online (as I've had my own memory model since birth :P) and can even post as an answer to questions asked by others, but to be honest, I don't exactly understand this.
C++ 程序员以前也开发过多线程应用程序,那么到底是 POSIX 线程,还是 Windows 线程,还是 C++11 线程又有什么关系呢?有什么好处?我想了解底层细节.
C++ programmers used to develop multi-threaded applications even before, so how does it matter if it's POSIX threads, or Windows threads, or C++11 threads? What are the benefits? I want to understand the low-level details.
我也有这样的感觉,C++11 内存模型与 C++11 多线程支持有某种关系,因为我经常看到这两者在一起.如果是,具体如何?为什么他们应该是相关的?
I also get this feeling that the C++11 memory model is somehow related to C++11 multi-threading support, as I often see these two together. If it is, how exactly? Why should they be related?
由于我不知道多线程内部是如何工作的,以及内存模型的一般含义,请帮助我理解这些概念.:-)
As I don't know how the internals of multi-threading work, and what memory model means in general, please help me understand these concepts. :-)
首先,你必须学会像语言律师一样思考.
First, you have to learn to think like a Language Lawyer.
C++ 规范未提及任何特定的编译器、操作系统或 CPU.它参考了一个抽象机器,它是实际系统的概括.在语言律师的世界里,程序员的工作是为抽象机器编写代码;编译器的工作是在具体机器上实现该代码.通过严格按照规范编码,您可以确保您的代码无需修改即可在任何具有兼容 C++ 编译器的系统上编译和运行,无论是现在还是 50 年后.
The C++ specification does not make reference to any particular compiler, operating system, or CPU. It makes reference to an abstract machine that is a generalization of actual systems. In the Language Lawyer world, the job of the programmer is to write code for the abstract machine; the job of the compiler is to actualize that code on a concrete machine. By coding rigidly to the spec, you can be certain that your code will compile and run without modification on any system with a compliant C++ compiler, whether today or 50 years from now.
C++98/C++03 规范中的抽象机基本上是单线程的.因此,不可能编写完全可移植"的多线程 C++ 代码.关于规范.该规范甚至没有说明内存加载和存储的原子性或加载和存储可能发生的顺序,更不用说互斥锁之类的事情了.
The abstract machine in the C++98/C++03 specification is fundamentally single-threaded. So it is not possible to write multi-threaded C++ code that is "fully portable" with respect to the spec. The spec does not even say anything about the atomicity of memory loads and stores or the order in which loads and stores might happen, never mind things like mutexes.
当然,您可以在实践中为特定的具体系统编写多线程代码——比如 pthreads 或 Windows.但是没有标准方法可以为 C++98/C++03 编写多线程代码.
Of course, you can write multi-threaded code in practice for particular concrete systems – like pthreads or Windows. But there is no standard way to write multi-threaded code for C++98/C++03.
C++11 中的抽象机在设计上是多线程的.它还具有定义良好的内存模型;也就是说,它说明了在访问内存时编译器可以做什么和不可以做什么.
The abstract machine in C++11 is multi-threaded by design. It also has a well-defined memory model; that is, it says what the compiler may and may not do when it comes to accessing memory.
考虑以下示例,其中两个线程同时访问一对全局变量:
Consider the following example, where a pair of global variables are accessed concurrently by two threads:
Global
int x, y;
Thread 1 Thread 2
x = 17; cout << y << " ";
y = 37; cout << x << endl;
线程 2 可能输出什么?
What might Thread 2 output?
在C++98/C++03下,这甚至不是未定义行为;问题本身毫无意义,因为标准没有考虑任何称为线程"的东西.
Under C++98/C++03, this is not even Undefined Behavior; the question itself is meaningless because the standard does not contemplate anything called a "thread".
在 C++11 下,结果是 Undefined Behavior,因为加载和存储通常不需要是原子的.这似乎没有太大的改进......而就其本身而言,事实并非如此.
Under C++11, the result is Undefined Behavior, because loads and stores need not be atomic in general. Which may not seem like much of an improvement... And by itself, it's not.
但是使用 C++11,你可以这样写:
But with C++11, you can write this:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17); cout << y.load() << " ";
y.store(37); cout << x.load() << endl;
现在事情变得更有趣了.首先,这里的行为是定义.线程 2 现在可以打印 0 0
(如果它在线程 1 之前运行)、37 17
(如果它在线程 1 之后运行)或 0 17
>(如果它在线程 1 分配给 x 之后但在分配给 y 之前运行).
Now things get much more interesting. First of all, the behavior here is defined. Thread 2 could now print 0 0
(if it runs before Thread 1), 37 17
(if it runs after Thread 1), or 0 17
(if it runs after Thread 1 assigns to x but before it assigns to y).
它不能打印的是37 0
,因为C++11中原子加载/存储的默认模式是强制顺序一致性.这只是意味着所有加载和存储都必须好像"它们按照您在每个线程中编写它们的顺序发生,而线程之间的操作可以根据系统的喜好交错进行.所以原子的默认行为为加载和存储提供原子性和排序.
What it cannot print is 37 0
, because the default mode for atomic loads/stores in C++11 is to enforce sequential consistency. This just means all loads and stores must be "as if" they happened in the order you wrote them within each thread, while operations among threads can be interleaved however the system likes. So the default behavior of atomics provides both atomicity and ordering for loads and stores.
现在,在现代 CPU 上,确保顺序一致性的成本可能很高.特别是,编译器很可能会在此处的每次访问之间发出全面的内存屏障.但是如果你的算法可以容忍无序加载和存储;即,如果它需要原子性但不需要排序;即,如果它可以容忍 37 0
作为这个程序的输出,那么你可以这样写:
Now, on a modern CPU, ensuring sequential consistency can be expensive. In particular, the compiler is likely to emit full-blown memory barriers between every access here. But if your algorithm can tolerate out-of-order loads and stores; i.e., if it requires atomicity but not ordering; i.e., if it can tolerate 37 0
as output from this program, then you can write this:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17,memory_order_relaxed); cout << y.load(memory_order_relaxed) << " ";
y.store(37,memory_order_relaxed); cout << x.load(memory_order_relaxed) << endl;
CPU 越现代,它就越有可能比前面的示例更快.
The more modern the CPU, the more likely this is to be faster than the previous example.
最后,如果你只需要保持特定的加载和存储顺序,你可以写:
Finally, if you just need to keep particular loads and stores in order, you can write:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17,memory_order_release); cout << y.load(memory_order_acquire) << " ";
y.store(37,memory_order_release); cout << x.load(memory_order_acquire) << endl;
这让我们回到有序的加载和存储——所以 37 0
不再是一个可能的输出——但它以最小的开销实现了这一点.(在这个简单的例子中,结果与完全成熟的顺序一致性相同;在更大的程序中,它不会.)
This takes us back to the ordered loads and stores – so 37 0
is no longer a possible output – but it does so with minimal overhead. (In this trivial example, the result is the same as full-blown sequential consistency; in a larger program, it would not be.)
当然,如果您只想看到0 0
或37 17
的输出,您可以只在原始代码周围包裹一个互斥锁.但是,如果您读了这么多,我敢打赌您已经知道它是如何工作的,而且这个答案已经比我预期的要长:-).
Of course, if the only outputs you want to see are 0 0
or 37 17
, you can just wrap a mutex around the original code. But if you have read this far, I bet you already know how that works, and this answer is already longer than I intended :-).
所以,底线.互斥体很棒,C++11 对它们进行了标准化.但有时出于性能原因,您需要较低级别的原语(例如,经典的 双重检查锁定模式).新标准提供了诸如互斥体和条件变量之类的高级小工具,并且还提供了诸如原子类型和各种类型的内存屏障之类的低级小工具.所以现在您可以完全使用标准指定的语言编写复杂的高性能并发例程,并且您可以确定您的代码将在今天和明天的系统上编译和运行不变.
So, bottom line. Mutexes are great, and C++11 standardizes them. But sometimes for performance reasons you want lower-level primitives (e.g., the classic double-checked locking pattern). The new standard provides high-level gadgets like mutexes and condition variables, and it also provides low-level gadgets like atomic types and the various flavors of memory barrier. So now you can write sophisticated, high-performance concurrent routines entirely within the language specified by the standard, and you can be certain your code will compile and run unchanged on both today's systems and tomorrow's.
尽管坦率地说,除非您是专家并且正在处理一些严肃的低级代码,否则您可能应该坚持使用互斥锁和条件变量.这就是我打算做的.
Although to be frank, unless you are an expert and working on some serious low-level code, you should probably stick to mutexes and condition variables. That's what I intend to do.
有关此内容的更多信息,请参阅此博客发布.
For more on this stuff, see this blog post.
这篇关于C++11 引入了标准化的内存模型.这是什么意思?它将如何影响 C++ 编程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!