|
10 | 10 | "- [Introduction](#Introduction)\n", |
11 | 11 | "- [Phases in the Optimization Workflow](#Phases-in-the-Optimization-Workflow)\n", |
12 | 12 | "- [Profiling and Tuning Your Code](#Profiling-and-Tuning-Your-Code)\n", |
| 13 | + " - [Analysis using Intel VTune Profiler](#Analysis-using-Intel-VTune-Profiler) \n", |
13 | 14 | "- [Locality Matters](#Locality-Matters)\n", |
14 | 15 | "- [Rightsize Your Work](#Rightsize-Your-Work)\n", |
15 | 16 | "- [Parallelization](#Parallelization)\n", |
|
53 | 54 | { |
54 | 55 | "cell_type": "markdown", |
55 | 56 | "id": "bf9e2b6a-fbe6-42fc-b4bc-aec2f4c65b07", |
56 | | - "metadata": {}, |
| 57 | + "metadata": { |
| 58 | + "jp-MarkdownHeadingCollapsed": true |
| 59 | + }, |
57 | 60 | "source": [ |
58 | 61 | "## Profiling and Tuning Your Code\n", |
59 | | - "After you have designed your code for high performance, the next step is to measure how it runs on the target accelerator. Add timers to the code, collect traces, and use tools like VTune Profiler to observe the program as it runs. The information collected can identify where hardware is bottlenecked and idle, illustrate how behavior compares with peak hardware roofline, and identify the most important hotspots to focus optimization efforts." |
| 62 | + "After you have designed your code for high performance, the next step is to measure how it runs on the target accelerator. Add timers to the code, collect traces, and use tools like VTune Profiler to observe the program as it runs. The information collected can identify where hardware is bottlenecked and idle, illustrate how behavior compares with peak hardware roofline, and identify the most important hotspots to focus optimization efforts.\n", |
| 63 | + "\n", |
| 64 | + "#### Analysis using Intel VTune Profiler\n", |
| 65 | + "\n", |
| 66 | + "You will need this section later to analyze your code performance using Intel VTune Profiler when working with code examples in the different modules.\n", |
| 67 | + "\n", |
| 68 | + "##### Steps to VTune analysis:\n", |
| 69 | + "- Modify code and compile\n", |
| 70 | + "- Use VTune cmd line to collect profiling data\n", |
| 71 | + "- Open Vtune results using Intel VTune Profile GUI\n", |
| 72 | + " - If the system you are using does not have GUI, compress and download the VTune results directory and open the results on a GUI computer with Intel VTune Profiler installed.\n", |
| 73 | + "\n", |
| 74 | + "##### Detailed Steps to do VTune Analysis:\n", |
| 75 | + "\n", |
| 76 | + "- Modify the module's example code and then \"Build and Run\", this will generate the binary in `lab/a.out`\n", |
| 77 | + "- Then in \"Terminal\", go to the current module directory and run the following vtune command (change the `-result-dir` value from `vtune_data` to something that identifies your code) \n", |
| 78 | + "```\n", |
| 79 | + "vtune -collect gpu-hotspots -result-dir vtune_data $(pwd)/lab/a.out\n", |
| 80 | + "```\n", |
| 81 | + "- Compress the vtune results directory to copy to your location computer (GUI)\n", |
| 82 | + "```\n", |
| 83 | + "tar -cvf vtune_data.tgz vtune_data\n", |
| 84 | + "```\n", |
| 85 | + "- Download the compressed vtune results:\n", |
| 86 | + " - If using Jupyter, right click on the `*.tgz` file and select \"Download\"\n", |
| 87 | + " - If using `ssh`, use `scp` to copy the `*.tgz` to your GUI computer\n", |
| 88 | + "- Uncompress the vtune results files:\n", |
| 89 | + "```\n", |
| 90 | + "tar -xvf vtune_data.tgz\n", |
| 91 | + "```\n", |
| 92 | + "- On your computer, install \"Intel VTune Profiler\" from [__Intel oneAPI Base Toolkit__](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html)\n", |
| 93 | + "- Open __Intel VTune Profiler__ and select the option to \"Open Results\" in the \"Welcome\" tab and select the vtune results directory that was downloaded, select the *.vtune file.\n", |
| 94 | + "- Navigate to the \"Graphics\" tab and then \"Platform\" tab to analyze performance timeline and compute stats\n", |
| 95 | + "- Refer to VTune Profiler documentation for more information\n", |
| 96 | + "\n", |
| 97 | + "<img src=\"assets/vtune_profiler.png\">\n" |
60 | 98 | ] |
61 | 99 | }, |
62 | 100 | { |
|
155 | 193 | ], |
156 | 194 | "metadata": { |
157 | 195 | "kernelspec": { |
158 | | - "display_name": "Python 3 (Intel® oneAPI 2023.2)", |
| 196 | + "display_name": "Python 3 (ipykernel)", |
159 | 197 | "language": "python", |
160 | | - "name": "c009-intel_distribution_of_python_3_oneapi-beta05-python" |
| 198 | + "name": "python3" |
161 | 199 | }, |
162 | 200 | "language_info": { |
163 | 201 | "codemirror_mode": { |
|
169 | 207 | "name": "python", |
170 | 208 | "nbconvert_exporter": "python", |
171 | 209 | "pygments_lexer": "ipython3", |
172 | | - "version": "3.9.16" |
| 210 | + "version": "3.11.5" |
173 | 211 | } |
174 | 212 | }, |
175 | 213 | "nbformat": 4, |
|
0 commit comments