Overview
GNU Octave is high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. See GNU Octave Wiki for more info.
Building from Source
GNU Octave has several build dependencies which can be seen here. This blog writing will only cover BLAS, LAPACK, and FFTW3 libraries. Mainly because those three are things which can be accelerated by Intel Math Kernel Library (Intel MKL) while the default option is to stick with free libraries like Netlib’s BLAS/OpenBLAS, Netlib’s LAPACK, and FFTW.
While it is possible to test using those free libraries mentioned as suggested on the build directive from the Wiki, I chose to use AMD AOCL as free library alternative apart from Intel MKL. AMD AOCL was chosen because of its generally better performance compared to any free math libraries counterpart in our internal tests.
Linking GNU Octave with Intel MKL’s FFT
By default, GNU Octave’s configure file doesn’t support linking with Intel MKL’s FFT and would fall back to FFTPACK which is much slower. To be able to utilize MKL version of FFTW3 and FFTW3F, you have to modify the existing configure file manually.
Before editing, generate your own MKL library linking line from Intel Math Kernel Library Link Line Advisor. Make sure to choose correctly every configuration regarding your target machine OS, compiler, architecture, etc.
For example, this is my configuration as I used GNU compiler 9.3.0 and GNU OpenMP:
Therefore, my linking line woud be:
-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
After that, put that linking line to configure file. You can find configure in the root dir of the source.
Now, update configure. Find these exact line which helps linking Octave with FFTW3 and FFTW3F libraries respectively:
ac_octave_fftw3_pkg_check=no
FFTW3_LIBS=
warn_fftw3="FFTW3 library not found. The slower FFTPACK library will be used instead."
case $with_fftw3 in
no)
warn_fftw3="--without-fftw3 specified. Functions or features that depend on FFTW3 will be disabled."
FFTW3_LIBS=
;;
yes | "")
ac_octave_fftw3_pkg_check=yes
FFTW3_LIBS="-lfftw3"
;;
-* | */* | *.a | *.so | *.so.* | *.o)
FFTW3_LIBS="$with_fftw3"
;;
*)
FFTW3_LIBS="-l$with_fftw3"
;;
esac
and
ac_octave_fftw3f_pkg_check=no
FFTW3F_LIBS=
warn_fftw3f="FFTW3F library not found. The slower FFTPACK library will be used instead."
case $with_fftw3f in
no)
warn_fftw3f="--without-fftw3f specified. Functions or features that depend on FFTW3F will be disabled."
FFTW3F_LIBS=
;;
yes | "")
ac_octave_fftw3f_pkg_check=yes
FFTW3F_LIBS="-lfftw3f"
;;
-* | */* | *.a | *.so | *.so.* | *.o)
FFTW3F_LIBS="$with_fftw3f"
;;
*)
FFTW3F_LIBS="-l$with_fftw3f"
;;
esac
Then, edit them to:
ac_octave_fftw3_pkg_check=no
FFTW3_LIBS=
warn_fftw3="FFTW3 library not found. The slower FFTPACK library will be used instead."
case $with_fftw3 in
no)
warn_fftw3="--without-fftw3 specified. Functions or features that depend on FFTW3 will be disabled."
FFTW3_LIBS=
;;
yes | "")
ac_octave_fftw3_pkg_check=yes
FFTW3_LIBS="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"
;;
-* | */* | *.a | *.so | *.so.* | *.o)
FFTW3_LIBS="$with_fftw3"
;;
*)
FFTW3_LIBS="-l$with_fftw3"
;;
esac
and
ac_octave_fftw3f_pkg_check=no
FFTW3F_LIBS=
warn_fftw3f="FFTW3F library not found. The slower FFTPACK library will be used instead."
case $with_fftw3f in
no)
warn_fftw3f="--without-fftw3f specified. Functions or features that depend on FFTW3F will be disabled."
FFTW3F_LIBS=
;;
yes | "")
ac_octave_fftw3f_pkg_check=yes
FFTW3F_LIBS="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"
;;
-* | */* | *.a | *.so | *.so.* | *.o)
FFTW3F_LIBS="$with_fftw3f"
;;
*)
FFTW3F_LIBS="-l$with_fftw3f"
;;
esac
Next, run autoreconf to reconfigure.
$ autoreconf
Finally, you’ll be able to link Intel MKL’s FFT as GNU Octave FFTW3 and FFTW3F dependencies by simply directing to Intel MKL lib and include directory. Also make sure to link BLAS and LAPACK using linking line generated above.
$ ./configure \
--with-fftw3-includedir=$MKLROOT/include/fftw \
--with-fftw3-libdir=$MKLROOT/lib/intel64 \
--with-fftw3f-includedir=$MKLROOT/include/fftw \
--with-fftw3f-libdir=$MKLROOT/lib/intel64 \
--with-blas="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl" \
--with-lapack="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl" \
...
Performance
To find out how good is the speedup from using Intel MKL, I decided to give it a quick test versus FOSS counterpart. Listed below is the machine configuration to test against. For hardware, I used ASUS ESC4000A-E10 provided by ASUS (Thank you, ASUS!).
Type | Model |
---|---|
CPU | AMD EPYC 7502P |
RAM | 8-channel 128GB DDR4-3200 |
OS | CentOS 7.7 |
Kernel | 4.4.238-1.el7.elrepo.x86_64 |
Also, below is compiler and libraries used in the tests.
Type | Intel MKL | FOSS | Notes |
---|---|---|---|
Compiler | GNU compiler 9.3.0 | GNU compiler 9.3.0 | FLAGS=-march=znver2 |
BLAS | Intel MKL 2020.0 | AMD BLIS AOCL 2.2 | MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.
AMD BLIS was compiled using GCC 10.1.0 and FLAGS=-march=znver2 |
LAPACK | Intel MKL 2020.0 | AMD libflame AOCL 2.2 | MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.
AMD libflame was compiled using GCC 10.1.0 and FLAGS=-march=znver2 |
FFTW3/FFTW3F | Intel MKL 2020.0 | AMD Optimized FFTW AOCL 2.2 | MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.
AMD Optimized FFTW was compiled using GCC 10.1.0 and FLAGS=-march=znver2 |
I used benchmarking script for GNU Octave courtesy of Harris Georgiou from University of Piraeus which can be found here. The script consisted of several run-time tests including pseudo-inverse matrix, linear equations system, linear regression, singular value decomposition, fast fourier transform, and bubblesort. Iterations are set on 30 with 2000N vector size which can be set on the script itself. I limited the test itself to 32 thread with SLURM scheduler.
fprintf('Benchmark suite 0.9b (.m) - Harris Georgiou (c) 2018\n\n');
clear all;
Nsz=2000;
Nlp=30;
Results show that GNU Octave built with Intel MKL is generally superior than FOSS counterpart. Here are the results:
Test | Intel MKL runtime (second, lower is better) | FOSS runtime (second, lower is better) | Intel MKL speedup vs FOSS (%) |
---|---|---|---|
Pseudo-inverse | 0.722554 | 0.917274 | 126.95% |
Linear equation system | 0.36448 | 0.191732 | 52.60% |
Linear regression | 0.224669 | 1.83595 | 817.18% |
Singular value decomposition | 2.53434 | 3.56501 | 140.67% |
Fast fourier transform | 0.0452899 | 0.0784928 | 173.31% |
Bubblesort | 43.3166 | 43.5649 | 100.57% |
As seen above, GNU Octave linked with Intel MKL only lost once against FOSS in Linear equation system test. Most of the time, MKL won by respectable margin ranged from 100.57% in Bubblesort (which maybe just inside margin of error) to a whopping 817.18% in Linear regression. Granted, maybe Linear regression gains more of its performance from better BLAS/LAPACK implementation but we can also see in Fast fourier transform test that MKL’s FFT also beat AMD Optimized FFTW by a healthy 173.31%. Therefore, it’s safe to say that linking GNU Octave with Intel MKL generally gives a better performance compared to FOSS libraries.
Closing Words
GNU Octave is a good software with good support on FOSS libraries. Some people would be glad to just build their GNU Octave around FOSS BLAS/LAPACK/FFTW libraries. But as we can see above, by adjusting GNU Octave to take advantage of Intel Math Kernel Library (which of course not a FOSS) we can improve its performance up to 8X compared to FOSS. Hopefully this writing can help you build your own Octave with Intel MKL linking and get the most of your machine performance.