|
| 1 | +# `Distributed Jacobian Solver SYCL/MPI` Sample |
| 2 | + |
| 3 | +The `Distributed Jacobian Solver SYCL/MPI` demonstrates using GPU-aware MPI-3, one-sided communications available in the Intel® MPI Library. |
| 4 | + |
| 5 | +| Area | Description |
| 6 | +|:--- |:-- |
| 7 | +| What you will learn | How to use MPI-3 one-sided communications with GPU buffers and SYCL* offload to reach better compute/communication overlap. |
| 8 | +| Time to complete | 45 minutes |
| 9 | +| Category | Concepts and Functionality |
| 10 | + |
| 11 | +For more information on Intel® MPI Library and complete documentation of all features, |
| 12 | +see the [Intel® MPI Library Documentation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library-documentation.html) page. |
| 13 | + |
| 14 | +## Purpose |
| 15 | + |
| 16 | +The sample demonstrates an actual use case (Jacobian solver) for MPI-3 one-sided communications allowing to overlap compute kernel and communications. The sample illustrated how to use host- and device-initiated onesided communication with SYCL kernels. |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +| Optimized for | Description |
| 21 | +|:--- |:--- |
| 22 | +| OS | Linux* |
| 23 | +| Hardware | 4th Generation Intel® Xeon® Scalable Processors <br> Intel® Data Center GPU Max Series |
| 24 | +| Software | Intel® MPI Library 2021.11 |
| 25 | + |
| 26 | +## Key Implementation Details |
| 27 | + |
| 28 | +This sample implements a well-known distributed 2D Jacobian solver with 1D data distribution. The sampple uses Intel® MPI [GPU Support](https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/current/gpu-support.html). |
| 29 | + |
| 30 | +The sample has three variants demonstrating different approaches to the Jacobi solver. |
| 31 | + |
| 32 | +### `01_jacobian_host_mpi_one-sided` |
| 33 | + |
| 34 | +This program demonstrates baseline implementation of the distributed Jacobian solver. In this sample you will see the basic idea of the algorithm, as well as how to implement the halo-exchange using MPI-3 one-sided primitives required for this solver. |
| 35 | + |
| 36 | +The solver is an iterative algorithm where each iteration of the program recalculates border values first, then border values transfer to neighbor processes, which are used in next iteration of algorithm. Each process recalculate internal points values for the next iteration in parallel with communication. After a number of iterations, the algorithm reports NORM values for validation purposes. |
| 37 | + |
| 38 | +### `02_jacobian_device_mpi_one-sided_gpu_aware` |
| 39 | + |
| 40 | +This program demonstrates how the same algorithm can be modified to add GPU offload capability. The program comes in two versions: OpenMP and SYCL. The program illustrates how device memory can be passed directly to MPI one-sided primitives. In particular, device memory may be passed to `MPI_Win_create` call to create an RMA Window placed on a device. Also, aside from a device RMA-window placement, device memory can be passed to `MPI_Put`/`MPI_Get` primitives as a target or origin buffer. |
| 41 | + |
| 42 | +> **Note**: Only contigouous MPI datatypes are supported. |
| 43 | +
|
| 44 | +### `03_jacobian_device_mpi_one-sided_device_initiated` |
| 45 | + |
| 46 | +This program demonstrates how to initiate one-sided communications directly from the offloaded code. The Intel® MPI Library allows calls to some communication primitives directly from the offloaded code (SYCL or OpenMP). This is the list of supported primitives: |
| 47 | + |
| 48 | +- `MPI_Put` |
| 49 | +- `MPI_Get` |
| 50 | +- `MPI_Win_lock` / `MPI_Win_lock_all` |
| 51 | +- `MPI_Win_unlock` / `MPI_Win_unlock_all` |
| 52 | +- `MPI_Win_flush` / `MPI_Win_flush_all` |
| 53 | +- `MPI_Win_fence` |
| 54 | + |
| 55 | +To enable device-initiated communications, you must set an extra environment variable: `I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1`. |
| 56 | + |
| 57 | +## Build the `Distributed Jacobian Solver SYCL/MPI` Sample |
| 58 | + |
| 59 | +> **Note**: If you have not already done so, set up your CLI |
| 60 | +> environment by sourcing the `setvars` script in the root of your oneAPI installation. |
| 61 | +> |
| 62 | +> Linux*: |
| 63 | +> - For system-wide installations: `. /opt/intel/oneapi/setvars.sh` |
| 64 | +> - For private installations: ` . ~/intel/oneapi/setvars.sh` |
| 65 | +> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'` |
| 66 | +> |
| 67 | +> For more information on configuring environment variables, or if you are using a Unified Directory Layout, see |
| 68 | +*[Use the setvars and oneapi-vars Scripts with Linux*](https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/current/use-the-setvars-script-with-linux-or-macos.html)*. |
| 69 | + |
| 70 | +### On Linux* |
| 71 | + |
| 72 | +1. Change to the sample directory. |
| 73 | + |
| 74 | +2. Run `make` to build a release version of the sample. |
| 75 | + ``` |
| 76 | + make |
| 77 | + ``` |
| 78 | + Alternatively, you can build the debug version. |
| 79 | + ``` |
| 80 | + make debug |
| 81 | + ``` |
| 82 | + |
| 83 | +3. Clean the project files. (Optional) |
| 84 | + ``` |
| 85 | + make clean |
| 86 | + ``` |
| 87 | + |
| 88 | +### Troubleshooting |
| 89 | + |
| 90 | +If an error occurs, you can get more details by running `make` with |
| 91 | +the `VERBOSE=1` argument: |
| 92 | +``` |
| 93 | +make VERBOSE=1 |
| 94 | +``` |
| 95 | +If you receive an error message, troubleshoot the problem using the Diagnostics Utility. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/docs/oneapi/user-guide-diagnostic-utility/current/overview.html)* for more information on using the utility. |
| 96 | + |
| 97 | +## Run the `Distributed Jacobian Solver SYCL/MPI` Sample |
| 98 | + |
| 99 | +### On Linux |
| 100 | + |
| 101 | +1. Run the sample using a `mpirun` command similar to the following: |
| 102 | + |
| 103 | + ``` |
| 104 | + mpirun -n 2 -genv I_MPI_OFFLOAD=1 ./src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl |
| 105 | + ``` |
| 106 | + |
| 107 | +Device-initiated communications requires that you set an extra environment variable: `I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1`. |
| 108 | + |
| 109 | +If everything worked, the Jacobi solver started an iterative computation for defined number of iterations. By default, the sample reports NORM values after every 10 computation iterations and reports the overall solver time at the end. |
| 110 | + |
| 111 | +## Example Output |
| 112 | + |
| 113 | +``` |
| 114 | +> mpirun -n 4 -genv I_MPI_OFFLOAD=2 ./src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl |
| 115 | +NORM value on iteration 10: 52.074559 |
| 116 | +NORM value on iteration 20: 30.813843 |
| 117 | +NORM value on iteration 30: 22.697284 |
| 118 | +NORM value on iteration 40: 18.277382 |
| 119 | +NORM value on iteration 50: 15.453062 |
| 120 | +NORM value on iteration 60: 13.473527 |
| 121 | +NORM value on iteration 70: 11.999518 |
| 122 | +NORM value on iteration 80: 10.853941 |
| 123 | +NORM value on iteration 90: 9.934763 |
| 124 | +NORM value on iteration 100: 9.178795 |
| 125 | +Average solver time: 0.333635(sec) |
| 126 | +``` |
| 127 | + |
| 128 | +## License |
| 129 | + |
| 130 | +Code samples are licensed under the MIT license. See |
| 131 | +[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
| 132 | + |
| 133 | +Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). |
0 commit comments