在 Intel Kaby Lake 架构上获取最后一级缓存未命中数

时间：2023-09-18

本文介绍了在 Intel Kaby Lake 架构上获取最后一级缓存未命中数的确切代码是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我读了一篇有趣的论文，题为A High-Resolution Side-Channel Attack on Last-Level Cache"，想找出我自己机器的索引哈希函数——即 Intel Core i7-7500U (Kaby Lake架构)——遵循这项工作的指导.

I read an interesting paper, entitled "A High-Resolution Side-Channel Attack on Last-Level Cache", and wanted to find out the index hash function for my own machine—i.e., Intel Core i7-7500U (Kaby Lake architecture)—following the leads from this work.

为了对哈希函数进行逆向工程，论文提到的第一步是:

To reverse-engineer the hash function, the paper mentions the first step as:

 for (n=16; ; n++) 
 {
   // ignore any miss on first run
   for (fill=0; !fill; fill++) 
   {
     // set pmc to count LLC miss
     reset_pmc();
     for (a=0; a<n; a++)
       // set_count*line_size=2^19
       load(a*2^19);
   }

   // get the LLC miss count
   if (read_pmc()>0) 
   {
     min = n;
     break;
   }
 }

如何在 C++ 中编写 reset_pmc() 和 read_pmc() 代码?从我目前在线阅读的所有内容来看，我认为它需要内联汇编代码，但我不知道使用什么指令来获取 LLC 未命中数.如果有人可以指定这两个步骤的代码，我将不胜感激.

How can I code the reset_pmc() and read_pmc() in C++? From all that I read online so far, I think it requires inline assembly code, but I have no clue what instructions to use to get the LLC miss count. I would be obliged if someone can specify the code for these two steps.

我在 VMware 工作站上运行 Ubuntu 16.04.1(64 位).

I am running Ubuntu 16.04.1 (64-bit) on VMware workstation.

PS:我在 LONGEST_LAT_CACHE.REFERENCES 和 LONGEST_LAT_CACHE.MISSES.intel.com/en-us/articles/intel-sdm" rel="nofollow noreferrer">英特尔架构软件开发人员手册，但我不知道如何使用它们.

P.S.: I found mention of these LONGEST_LAT_CACHE.REFERENCES and LONGEST_LAT_CACHE.MISSES in Chapter-18 Volume 3B of the Intel Architectures Software Developer's Manual, but I do not know how to use them.

推荐答案

您可以按照 Cody 的建议使用 perf 来测量代码外部的事件，但我从您的代码示例中怀疑您需要对性能计数器的细粒度、编程访问.

You can use perf as Cody suggested to measure the events from outside the code, but I suspect from your code sample that you need fine-grained, programmatic access to the performance counters.

要做到这一点，您需要启用计数器的用户模式读取，并且还需要一种对它们进行编程的方法.由于这些是受限制的操作，因此您至少需要 OS 内核的一些帮助才能做到这一点.推出自己的解决方案将非常困难，但幸运的是，已有几个适用于 Ubunty 16.04 的现有解决方案:

To do that, you need to enable user-mode reading of the counters, and also have a way to program them. Since those are restricted operations, you need at least some help from the OS kernel to do that. Rolling your own solution is going to be pretty difficult, but luckily there are several existing solutions for Ubunty 16.04:

Andi Kleen 的 jevents 库，其中包括让您从用户空间读取 PMU 事件.pmu-tools的这部分我没有亲自用过，但是我用过的东西质量都很高.它似乎使用现有的 perf_events 系统调用进行计数器编程，因此并且不需要内核模型.
libpfc 库是内核模块和用户空间代码的从头实现，允许用户空间读取性能计数器.我用过这个，效果很好.您安装允许您对 PMU 进行编程的内核模块，然后使用 libpfc 公开的 API 从用户空间读取计数器(调用归结为 rdpmc 指令).这是读取计数器的最准确和精确的方法，它包括开销减法"功能，通过减去 PMU 读取代码本身引起的事件，可以为您提供测量区域的真实 PMU 计数.您需要固定到单个核心以使计数有意义，如果您的过程中断，您将得到虚假结果.
英特尔的开源处理器计数器监视器库.我没有在 Linux 上尝试过，但我使用了它的前身库，名称非常相似¹ 性能计数器监视器在 Windows 上，它工作正常.在 Windows 上，它需要一个内核驱动程序，但在 Linux 上，您似乎可以使用驱动器或让它通过 perf_events.
使用 likwid 库的 Marker API 功能.Likwid 已经存在了一段时间，似乎得到了很好的支持.我过去曾使用过 likwid，但仅用于测量类似于 perf stat 的整个过程，而不是使用标记 API.要使用标记 API，您仍然需要将您的流程作为 likwid 测量流程的子项运行，但您可以通过编程方式读取流程中的计数器值，这正是您所需要的(据我所知).我不确定使用标记 API 时 likwid 如何设置和读取计数器.

Andi Kleen's jevents library, which among other things lets you read PMU events from user space. I haven't personally used this part of pmu-tools, but the stuff I have used has been high quality. It seems to use the existing perf_events syscalls for counter programming so and doesn't need a kernel model.
The libpfc library is a from-scratch implementation of a kernel module and userland code that allows userland reading of the performance counters. I've used this and it works well. You install the kernel module which allows you to program the PMU, and then use the API exposed by libpfc to read the counters from userspace (the calls boil down to rdpmc instructions). It is the most accurate and precise way to read the counters, and it includes "overhead subtraction" functionality which can give you the true PMU counts for the measured region by subtracting out the events caused by the PMU read code itself. You need to pin to a single core for the counts to make sense, and you will get bogus results if your process is interrupted.
Intel's open-sourced Processor Counter Monitor library. I haven't tried this on Linux, but I used its predecessor library, the very similarly named¹ Performance Counter Monitor on Windows, and it worked. On Windows it needs a kernel driver, but on Linux it seems you can either use a drive or have it go through perf_events.
Use the likwid library's Marker API functionality. Likwid has been around for a while and seems well supported. I have used likwid in the past, but only to measure whole processes in a matter similar to perf stat and not with the marker API. To use the marker API you still need to run your process as a child of the likwid measurement process, but you can read programmatically the counter values within your process, which is what you need (as I understand it). I'm not sure how likwid is setting up and reading the counters when the marker API is used.

所以你有很多选择！我认为它们都可以工作，但我可以亲自为 libpfc 担保，因为我自己在 Ubuntu 16.04 上出于同样的目的使用了它.该项目正在积极开发中，可能是上述项目中最准确(开销最少)的项目.所以我可能会从那个开始.

So you've got a lot of options! I think all of them could work, but I can personally vouch for libpfc since I've used it myself for the same purpose on Ubuntu 16.04. The project is actively developed and probably the most accurate (least overhead) of the above. So I'd probably start with that one.

上述所有解决方案都应该适用于 Kaby Lake，因为每个连续的性能监控架构"的功能似乎通常是前一个的超集，并且 API 通常被保留.但是对于libpfc，作者有限制它只支持 Haswell 的架构 (PMA v3)，但你只需要更改一行代码在本地解决这个问题.

All of the solutions above should be able to work for Kaby Lake, since the functionality of each successive "Performance Monitoring Architecture" seems to generally be a superset of the prior one, and the API is generally preserved. In the case of libpfc, however, the author has restricted it to only support Haswell's architecture (PMA v3), but you just need to change one line of code locally to fix that.

¹ 实际上，它们的首字母缩写词通常都称为PCM，我怀疑这个新项目只是旧 PCM 项目的正式开源延续(也有源代码形式，但没有社区贡献机制).

¹ Indeed, they are both commonly called by their acronym, PCM, and I suspect that the new project is simply the officially open sourced continuation of the old PCM project (which was also available in source form, but without a mechanism for community contribution).

这篇关于在 Intel Kaby Lake 架构上获取最后一级缓存未命中数的确切代码是什么的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持html5模板网！

上一篇：数组结构和结构数组 - 性能差异 下一篇：C++ 缓存感知编程

在 Intel Kaby Lake 架构上获取最后一级缓存未命中数

问题描述

推荐答案

相关文章

最新文章