|
1 | 1 | OpenBLAS ChangeLog |
| 2 | +==================================================================== |
| 3 | +Version 0.3.9 |
| 4 | + 1-Mar-2020 |
| 5 | + |
| 6 | + common: |
| 7 | + * Fixed a miscompilation of the GETRF functions with CMAKE |
| 8 | + * Imported bugfix 390 from LAPACK (missing NaN propagation in xCOMBSSQ) |
| 9 | + * The size of the memory buffer used for splitting GEMM tasks across |
| 10 | + multiple threads can now be configured in the build system. |
| 11 | + |
| 12 | +POWER: |
| 13 | + * Fixed several compilation problems related to endianness |
| 14 | + and ELF version on POWER8 and POWER9 |
| 15 | + * Fixed use of the absolute value IAMIN/IAMAX instead of IMIN/IMAX |
| 16 | + * Fixed a race condition in the level3 blas code |
| 17 | + |
| 18 | +MIPS64: |
| 19 | + * Fixed use of the absoltute value IAMIN/IAMAX instead of IMIN/IMAX |
| 20 | + |
| 21 | +ARMV7: |
| 22 | + * Fixed a race condition in the level3 blas code |
| 23 | + * Fixed compilation on Android |
| 24 | +ARMV8: |
| 25 | + * Added support for Ampere EMAG8180 |
| 26 | + * Added support for Neoverse N1 |
| 27 | + * Improved performance of the blas_lock function |
| 28 | + * Fixed a race condition in the level3 blas code |
| 29 | + * Fixed a performance regression on TSV110-based servers |
| 30 | + |
| 31 | +x86_64: |
| 32 | + * Fixed a long-standing error with undeclared register overwrites |
| 33 | + in the DSCAL microkernel for HASWELL,SKYLAKEX and ZEN |
| 34 | + * Fixed a long-standing bug in the SSE implementation of IAMAX |
| 35 | + * Fixed a CMAKE build failure with DYNAMIC_ARCH |
| 36 | + * Fixed cpu autodetection of Goldmont+, Cannon Lake and Ice Lake |
| 37 | + * Fixed a compilation failure on OSX with compiler name containing dash |
| 38 | + * Fixed compilation with MinGW on SkylakeX |
| 39 | + * Improved speed of the AVX512 GEMM3M kernel on SkylakeX |
| 40 | + * Added an AVX512 STRMM kernel for SkylakeX |
| 41 | + * Improved GEMM performance on Haswell and Zen |
| 42 | + |
| 43 | +zarch: |
| 44 | + * fixed compilation of the DYNAMIC_ARCH code |
| 45 | + |
| 46 | +==================================================================== |
| 47 | +Version 0.3.8 |
| 48 | + 9-Feb-2020 |
| 49 | + |
| 50 | +common: |
| 51 | +` * LAPACK has been updated to 3.9.0 (plus patches up to |
| 52 | + January 2nd, 2020) |
| 53 | + * CMAKE support has been improved in several areas including |
| 54 | + cross-compilation |
| 55 | + * a thread race condition in the GEMM3M kernels was resolved |
| 56 | + * the "generic" (plain C) gemm beta kernel used by many targets |
| 57 | + has been sped up |
| 58 | + * an optimized version of the LAPACK trtrs functions has been added |
| 59 | + * an incompatibilty between the LAPACK tests and the OpenBLAS |
| 60 | + implementation of XERBLA was resolved, removing the numerous |
| 61 | + warnings about wrong error exits in the former |
| 62 | + * support for NetBSD has been added |
| 63 | + * support for compilation with g95 and non-GNU versions of ld |
| 64 | + has been improved |
| 65 | + * support for compilation with (upcoming) gcc 10 has been added |
| 66 | + |
| 67 | +POWER: |
| 68 | + * worked around miscompilation of several POWER8 and POWER9 |
| 69 | + kernels by older versions of gcc |
| 70 | + * added support for big-endian POWER8 and for compilation on AIX |
| 71 | + * corrected bugs in the big-endian support for PPC440 and PPC970 |
| 72 | + * DYNAMIC_ARCH support is now available in CMAKE builds as well |
| 73 | + |
| 74 | +ARMV8: |
| 75 | + * performance of DGEMM_BETA and SGEMM_NCOPY has been improved |
| 76 | + * compilation for 32bit works again |
| 77 | + * performance of the RPCC function has been improved |
| 78 | + * improved performance on small systems |
| 79 | + * DYNAMIC_ARCH support is now available in CMAKE builds as well |
| 80 | + * cross-compilation from OSX to IOS was simplified |
| 81 | + |
| 82 | +x86_64: |
| 83 | + * a new AVX512 DGEMM kernel was added and the AVX512 SGEMM kernel |
| 84 | + was significantly improved |
| 85 | + * optimized AVX512 kernels for CGEMM and ZGEMM have been added |
| 86 | + * AVX2 kernels for STRMM, SGEMM, and CGEMM have been significantly |
| 87 | + sped up and optimized CGEMM3M and ZGEMM3M kernels have been added |
| 88 | + * added support for QEMU virtual cpus |
| 89 | + * a compilation problem with PGI and SUN compilers was fixed |
| 90 | + * Intel "Goldmont plus" is now autodetected |
| 91 | + * a potential crash on program exit on MS Windows has been fixed |
| 92 | + |
| 93 | +x86: |
| 94 | + * an unwanted case sensitivity in the implementation of LSAME |
| 95 | + on older 32bit AMD cpus was fixed |
| 96 | + |
| 97 | +zarch: |
| 98 | + * Z15 is now supported as Z14 |
| 99 | + * DYNAMIC_ARCH is now available on ZARCH as well |
| 100 | + |
2 | 101 | ==================================================================== |
3 | 102 | Version 0.3.7 |
4 | 103 | 11-Aug 2019 |
5 | 104 |
|
6 | 105 | common: |
7 | | - * having the gmake special variables TARGET_ARCH or TARGET_MACH |
8 | | - defined no longer causes build failures in ctest or utest |
9 | | - * defining NO_AFFINITY or USE_TLS to 0 in gmake builds no longer |
10 | | - has the same effect as setting them to 1 |
11 | | - * a new test program was added to allow checking the library for |
12 | | - thread safety |
13 | | - * a new option USE_LOCKING was added to ensure thread safety when |
14 | | - OpenBLAS itself is built without multithreading but will be |
15 | | - called from multiple threads. |
16 | | - * a build failure on Linux with glibc versions earlier than 2.5 |
17 | | - was fixed |
18 | | - * a runtime error with CPU enumeration (and NO_AFFINITY not set) |
19 | | - on glibc 2.6 was fixed |
20 | | - * NO_AFFINITY was added to the CMAKE options (and defaults to being |
21 | | - active on Linux, as in the gmake builds) |
| 106 | + * having the gmake special variables TARGET_ARCH or TARGET_MACH |
| 107 | + defined no longer causes build failures in ctest or utest |
| 108 | + * defining NO_AFFINITY or USE_TLS to 0 in gmake builds no longer |
| 109 | + has the same effect as setting them to 1 |
| 110 | + * a new test program was added to allow checking the library for |
| 111 | + thread safety |
| 112 | + * a new option USE_LOCKING was added to ensure thread safety when |
| 113 | + OpenBLAS itself is built without multithreading but will be |
| 114 | + called from multiple threads. |
| 115 | + * a build failure on Linux with glibc versions earlier than 2.5 |
| 116 | + was fixed |
| 117 | + * a runtime error with CPU enumeration (and NO_AFFINITY not set) |
| 118 | + on glibc 2.6 was fixed |
| 119 | + * NO_AFFINITY was added to the CMAKE options (and defaults to being |
| 120 | + active on Linux, as in the gmake builds) |
22 | 121 |
|
23 | 122 | x86_64: |
24 | | - * the build-time logic for detection of AVX512 availability in |
25 | | - the processor and compiler was fixed |
26 | | - * gmake builds on OSX now set the internal name of the library to |
27 | | - libopenblas.0.dylib (consistent with CMAKE) |
28 | | - * the Haswell DGEMM kernel received a significant speedup through |
29 | | - improved prefetch and load instructions |
30 | | - * performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly |
31 | | - increased by avoiding vpermpd instructions |
32 | | - * the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled |
33 | | - to fix remaining errors in DGEMM, DSYMM and DTRMM |
34 | | - |
35 | | -## POWER: |
36 | | - * added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970 |
37 | | - * added optimized kernels for POWER9 SGEMM and STRMM |
38 | | - |
39 | | -## ARMV7: |
40 | | - * fixed the softfp implementations of xAMAX and IxAMAX |
41 | | - * removed the predefined -march= flags on both ARMV5 and ARMV6 as |
42 | | - they were appropriate for only a subset of platforms |
| 123 | + * the build-time logic for detection of AVX512 availability in |
| 124 | + the processor and compiler was fixed |
| 125 | + * gmake builds on OSX now set the internal name of the library to |
| 126 | + libopenblas.0.dylib (consistent with CMAKE) |
| 127 | + * the Haswell DGEMM kernel received a significant speedup through |
| 128 | + improved prefetch and load instructions |
| 129 | + * performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly |
| 130 | + increased by avoiding vpermpd instructions |
| 131 | + * the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled |
| 132 | + to fix remaining errors in DGEMM, DSYMM and DTRMM |
| 133 | + |
| 134 | +POWER: |
| 135 | + * added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970 |
| 136 | + * added optimized kernels for POWER9 SGEMM and STRMM |
| 137 | + |
| 138 | +ARMV7: |
| 139 | + * fixed the softfp implementations of xAMAX and IxAMAX |
| 140 | + * removed the predefined -march= flags on both ARMV5 and ARMV6 as |
| 141 | + they were appropriate for only a subset of platforms |
43 | 142 |
|
44 | 143 | ==================================================================== |
45 | 144 | Version 0.3.6 |
|
0 commit comments