Skip to content

Commit 74e549f

Browse files
author
Alexander Sannikov
authored
Libraries/MPI: added GPU samples (#1965)
1 parent e2249f5 commit 74e549f

20 files changed

Lines changed: 2024 additions & 0 deletions

File tree

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
all:
2+
make -C src/01_jacobian_host_mpi_one-sided
3+
make -C src/02_jacobian_device_mpi_one-sided_gpu_aware
4+
make -C src/03_jacobian_device_mpi_one-sided_device_initiated
5+
6+
debug:
7+
make debug -C src/01_jacobian_host_mpi_one-sided
8+
make debug -C src/02_jacobian_device_mpi_one-sided_gpu_aware
9+
make debug -C src/03_jacobian_device_mpi_one-sided_device_initiated
10+
11+
clean:
12+
make clean -C src/01_jacobian_host_mpi_one-sided
13+
make clean -C src/02_jacobian_device_mpi_one-sided_gpu_aware
14+
make clean -C src/03_jacobian_device_mpi_one-sided_device_initiated
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright Intel Corporation
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# `Distributed Jacobian Solver SYCL/MPI` Sample
2+
3+
The `Distributed Jacobian Solver SYCL/MPI` demonstrates using GPU-aware MPI-3, one-sided communications available in the Intel® MPI Library.
4+
5+
| Area | Description
6+
|:--- |:--
7+
| What you will learn | How to use MPI-3 one-sided communications with GPU buffers and SYCL* offload to reach better compute/communication overlap.
8+
| Time to complete | 45 minutes
9+
| Category | Concepts and Functionality
10+
11+
For more information on Intel® MPI Library and complete documentation of all features,
12+
see the [Intel® MPI Library Documentation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library-documentation.html) page.
13+
14+
## Purpose
15+
16+
The sample demonstrates an actual use case (Jacobian solver) for MPI-3 one-sided communications allowing to overlap compute kernel and communications. The sample illustrated how to use host- and device-initiated onesided communication with SYCL kernels.
17+
18+
## Prerequisites
19+
20+
| Optimized for | Description
21+
|:--- |:---
22+
| OS | Linux*
23+
| Hardware | 4th Generation Intel® Xeon® Scalable Processors <br> Intel® Data Center GPU Max Series
24+
| Software | Intel® MPI Library 2021.11
25+
26+
## Key Implementation Details
27+
28+
This sample implements a well-known distributed 2D Jacobian solver with 1D data distribution. The sampple uses Intel® MPI [GPU Support](https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/current/gpu-support.html).
29+
30+
The sample has three variants demonstrating different approaches to the Jacobi solver.
31+
32+
### `01_jacobian_host_mpi_one-sided`
33+
34+
This program demonstrates baseline implementation of the distributed Jacobian solver. In this sample you will see the basic idea of the algorithm, as well as how to implement the halo-exchange using MPI-3 one-sided primitives required for this solver.
35+
36+
The solver is an iterative algorithm where each iteration of the program recalculates border values first, then border values transfer to neighbor processes, which are used in next iteration of algorithm. Each process recalculate internal points values for the next iteration in parallel with communication. After a number of iterations, the algorithm reports NORM values for validation purposes.
37+
38+
### `02_jacobian_device_mpi_one-sided_gpu_aware`
39+
40+
This program demonstrates how the same algorithm can be modified to add GPU offload capability. The program comes in two versions: OpenMP and SYCL. The program illustrates how device memory can be passed directly to MPI one-sided primitives. In particular, device memory may be passed to `MPI_Win_create` call to create an RMA Window placed on a device. Also, aside from a device RMA-window placement, device memory can be passed to `MPI_Put`/`MPI_Get` primitives as a target or origin buffer.
41+
42+
> **Note**: Only contigouous MPI datatypes are supported.
43+
44+
### `03_jacobian_device_mpi_one-sided_device_initiated`
45+
46+
This program demonstrates how to initiate one-sided communications directly from the offloaded code. The Intel® MPI Library allows calls to some communication primitives directly from the offloaded code (SYCL or OpenMP). This is the list of supported primitives:
47+
48+
- `MPI_Put`
49+
- `MPI_Get`
50+
- `MPI_Win_lock` / `MPI_Win_lock_all`
51+
- `MPI_Win_unlock` / `MPI_Win_unlock_all`
52+
- `MPI_Win_flush` / `MPI_Win_flush_all`
53+
- `MPI_Win_fence`
54+
55+
To enable device-initiated communications, you must set an extra environment variable: `I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1`.
56+
57+
## Build the `Distributed Jacobian Solver SYCL/MPI` Sample
58+
59+
> **Note**: If you have not already done so, set up your CLI
60+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
61+
>
62+
> Linux*:
63+
> - For system-wide installations: `. /opt/intel/oneapi/setvars.sh`
64+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
65+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
66+
>
67+
> For more information on configuring environment variables, or if you are using a Unified Directory Layout, see
68+
*[Use the setvars and oneapi-vars Scripts with Linux*](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/current/use-the-setvars-script-with-linux-or-macos.html)*.
69+
70+
### On Linux*
71+
72+
1. Change to the sample directory.
73+
74+
2. Run `make` to build a release version of the sample.
75+
```
76+
make
77+
```
78+
Alternatively, you can build the debug version.
79+
```
80+
make debug
81+
```
82+
83+
3. Clean the project files. (Optional)
84+
```
85+
make clean
86+
```
87+
88+
### Troubleshooting
89+
90+
If an error occurs, you can get more details by running `make` with
91+
the `VERBOSE=1` argument:
92+
```
93+
make VERBOSE=1
94+
```
95+
If you receive an error message, troubleshoot the problem using the Diagnostics Utility. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/docs/oneapi/user-guide-diagnostic-utility/current/overview.html)* for more information on using the utility.
96+
97+
## Run the `Distributed Jacobian Solver SYCL/MPI` Sample
98+
99+
### On Linux
100+
101+
1. Run the sample using a `mpirun` command similar to the following:
102+
103+
```
104+
mpirun -n 2 -genv I_MPI_OFFLOAD=1 ./src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
105+
```
106+
107+
Device-initiated communications requires that you set an extra environment variable: `I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1`.
108+
109+
If everything worked, the Jacobi solver started an iterative computation for defined number of iterations. By default, the sample reports NORM values after every 10 computation iterations and reports the overall solver time at the end.
110+
111+
## Example Output
112+
113+
```
114+
> mpirun -n 4 -genv I_MPI_OFFLOAD=2 ./src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
115+
NORM value on iteration 10: 52.074559
116+
NORM value on iteration 20: 30.813843
117+
NORM value on iteration 30: 22.697284
118+
NORM value on iteration 40: 18.277382
119+
NORM value on iteration 50: 15.453062
120+
NORM value on iteration 60: 13.473527
121+
NORM value on iteration 70: 11.999518
122+
NORM value on iteration 80: 10.853941
123+
NORM value on iteration 90: 9.934763
124+
NORM value on iteration 100: 9.178795
125+
Average solver time: 0.333635(sec)
126+
```
127+
128+
## License
129+
130+
Code samples are licensed under the MIT license. See
131+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
132+
133+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"guid": "6C6DF339-103B-4A7C-9A8D-72A48351431B",
3+
"name": "Distributed Jacobian Solver SYCL/MPI",
4+
"categories": ["Toolkit/oneAPI Libraries/MPI"],
5+
"description": "Distributed implementation of Jacobian solver with OpenMP/SYCL offload and MPI-3 one-sided.",
6+
"toolchain": [ "icpx" ],
7+
"dependencies": [ "compiler|icpx,icx","mpi" ],
8+
"languages": [ { "cpp": { "properties": { "projectOptions": [ { "projectType": "makefile" } ] } } } ],
9+
"targetDevice": [ "GPU" ],
10+
"os": [ "linux" ],
11+
"builder": [ "make" ],
12+
"ciTests": {
13+
"linux": [
14+
{
15+
"id": "jacobian_omp",
16+
"env": [
17+
"export I_MPI_OFFLOAD=2"
18+
],
19+
"steps": [
20+
"make clean",
21+
"make",
22+
"mpirun -n 2 ./src/01_jacobian_host_mpi_one-sided/mpi3_onesided_jacobian",
23+
"mpirun -n 2 ./src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl",
24+
"mpirun -n 2 -genv I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1 ./src/03_jacobian_device_mpi_one-sided_device_initiated/mpi3_onesided_jacobian_gpu_sycl_device_initiated"
25+
]
26+
}
27+
]
28+
},
29+
"expertise": "Concepts and Functionality"
30+
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
INCLUDES =
2+
LDFLAGS = -lm
3+
CFLAGS = -Wall -Wformat-security -Werror=format-security
4+
CXXFLAGS = -Wall -Wformat-security -Werror=format-security
5+
# Use icx from DPC++ oneAPI toolkit to compile. Please source DPCPP's vars.sh before compilation.
6+
CC = mpiicx
7+
CXX = mpiicpx
8+
example = mpi3_onesided_jacobian
9+
10+
all: CFLAGS += -O2
11+
all: CXXFLAGS += -O2
12+
all: $(example)
13+
14+
debug: CFLAGS += -O0 -g
15+
debug: CXXFLAGS += -O0 -g
16+
debug: $(example)
17+
18+
% : %.c
19+
$(CC) $(CFLAGS) $(INCLUDES) -o $@ $< $(LDFLAGS)
20+
21+
% : %.cpp
22+
$(CXX) $(CXXFLAGS) $(INCLUDES) -o $@ $< $(LDFLAGS)
23+
24+
clean:
25+
-rm -f $(example).o $(example)

0 commit comments

Comments
 (0)