Optimizing GNU Octave 5.2.0 with Intel Math Kernel Library

Overview

GNU Octave is high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. See GNU Octave Wiki for more info.

Building from Source

GNU Octave has several build dependencies which can be seen here. This blog writing will only cover BLAS, LAPACK, and FFTW3 libraries. Mainly because those three are things which can be accelerated by Intel Math Kernel Library (Intel MKL) while the default option is to stick with free libraries like Netlib’s BLAS/OpenBLAS, Netlib’s LAPACK, and FFTW.

While it is possible to test using those free libraries mentioned as suggested on the build directive from the Wiki, I chose to use AMD AOCL as free library alternative apart from Intel MKL. AMD AOCL was chosen because of its generally better performance compared to any free math libraries counterpart in our internal tests.

Linking GNU Octave with Intel MKL’s FFT

By default, GNU Octave’s configure file doesn’t support linking with Intel MKL’s FFT and would fall back to FFTPACK which is much slower. To be able to utilize MKL version of FFTW3 and FFTW3F, you have to modify the existing configure file manually.

Before editing, generate your own MKL library linking line from Intel Math Kernel Library Link Line Advisor. Make sure to choose correctly every configuration regarding your target machine OS, compiler, architecture, etc.

For example, this is my configuration as I used GNU compiler 9.3.0 and GNU OpenMP:

My Intel MKL linking line

Intel MKL linking line example

Therefore, my linking line woud be:

-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl

After that, put that linking line to configure file. You can find configure in the root dir of the source.

Root dir of GNU Octave source

Root dir of GNU Octave source

Now, update configure. Find these exact line which helps linking Octave with FFTW3 and FFTW3F libraries respectively:

  ac_octave_fftw3_pkg_check=no
  FFTW3_LIBS=
  warn_fftw3="FFTW3 library not found.  The slower FFTPACK library will be used instead."
  case $with_fftw3 in
    no)
      warn_fftw3="--without-fftw3 specified.  Functions or features that depend on FFTW3 will be disabled."
         FFTW3_LIBS=
    ;;
    yes | "")
      ac_octave_fftw3_pkg_check=yes
      FFTW3_LIBS="-lfftw3"
    ;;
    -* | */* | *.a | *.so | *.so.* | *.o)
      FFTW3_LIBS="$with_fftw3"
    ;;
    *)
      FFTW3_LIBS="-l$with_fftw3"
    ;;
  esac

and

  ac_octave_fftw3f_pkg_check=no
  FFTW3F_LIBS=
  warn_fftw3f="FFTW3F library not found.  The slower FFTPACK library will be used instead."
  case $with_fftw3f in
    no)
      warn_fftw3f="--without-fftw3f specified.  Functions or features that depend on FFTW3F will be disabled."
         FFTW3F_LIBS=
    ;;
    yes | "")
      ac_octave_fftw3f_pkg_check=yes
      FFTW3F_LIBS="-lfftw3f"
    ;;
    -* | */* | *.a | *.so | *.so.* | *.o)
      FFTW3F_LIBS="$with_fftw3f"
    ;;
    *)
      FFTW3F_LIBS="-l$with_fftw3f"
    ;;
  esac

Then, edit them to:

  ac_octave_fftw3_pkg_check=no
  FFTW3_LIBS=
  warn_fftw3="FFTW3 library not found.  The slower FFTPACK library will be used instead."
  case $with_fftw3 in
    no)
      warn_fftw3="--without-fftw3 specified.  Functions or features that depend on FFTW3 will be disabled."
         FFTW3_LIBS=
    ;;
    yes | "")
      ac_octave_fftw3_pkg_check=yes
      FFTW3_LIBS="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"
    ;;
    -* | */* | *.a | *.so | *.so.* | *.o)
      FFTW3_LIBS="$with_fftw3"
    ;;
    *)
      FFTW3_LIBS="-l$with_fftw3"
    ;;
  esac

and

  ac_octave_fftw3f_pkg_check=no
  FFTW3F_LIBS=
  warn_fftw3f="FFTW3F library not found.  The slower FFTPACK library will be used instead."
  case $with_fftw3f in
    no)
      warn_fftw3f="--without-fftw3f specified.  Functions or features that depend on FFTW3F will be disabled."
         FFTW3F_LIBS=
    ;;
    yes | "")
      ac_octave_fftw3f_pkg_check=yes
      FFTW3F_LIBS="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl"
    ;;
    -* | */* | *.a | *.so | *.so.* | *.o)
      FFTW3F_LIBS="$with_fftw3f"
    ;;
    *)
      FFTW3F_LIBS="-l$with_fftw3f"
    ;;
  esac

Next, run autoreconf to reconfigure.

$ autoreconf

Finally, you’ll be able to link Intel MKL’s FFT as GNU Octave FFTW3 and FFTW3F dependencies by simply directing to Intel MKL lib and include directory. Also make sure to link BLAS and LAPACK using linking line generated above.

$ ./configure \
--with-fftw3-includedir=$MKLROOT/include/fftw \
--with-fftw3-libdir=$MKLROOT/lib/intel64 \
--with-fftw3f-includedir=$MKLROOT/include/fftw \
--with-fftw3f-libdir=$MKLROOT/lib/intel64 \
--with-blas="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl" \
--with-lapack="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl" \
...

Performance

To find out how good is the speedup from using Intel MKL, I decided to give it a quick test versus FOSS counterpart. Listed below is the machine configuration to test against. For hardware, I used ASUS ESC4000A-E10 provided by ASUS (Thank you, ASUS!).

Type Model
CPU AMD EPYC 7502P
RAM 8-channel 128GB DDR4-3200
OS CentOS 7.7
Kernel 4.4.238-1.el7.elrepo.x86_64

Also, below is compiler and libraries used in the tests.

Type Intel MKL FOSS Notes
Compiler GNU compiler 9.3.0 GNU compiler 9.3.0 FLAGS=-march=znver2
BLAS Intel MKL 2020.0 AMD BLIS AOCL 2.2 MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.

AMD BLIS was compiled using GCC 10.1.0 and FLAGS=-march=znver2

LAPACK Intel MKL 2020.0 AMD libflame AOCL 2.2 MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.

AMD libflame was compiled using GCC 10.1.0 and FLAGS=-march=znver2

FFTW3/FFTW3F Intel MKL 2020.0 AMD Optimized FFTW AOCL 2.2 MKL was tested using MKL_DEBUG_CPU_TYPE=5. Read here for more info.

AMD Optimized FFTW was compiled using GCC 10.1.0 and FLAGS=-march=znver2

I used benchmarking script for GNU Octave courtesy of Harris Georgiou from University of Piraeus which can be found here. The script consisted of several run-time tests including pseudo-inverse matrix, linear equations system, linear regression, singular value decomposition, fast fourier transform, and bubblesort. Iterations are set on 30 with 2000N vector size which can be set on the script itself. I limited the test itself to 32 thread with SLURM scheduler.

fprintf('Benchmark suite 0.9b (.m) - Harris Georgiou (c) 2018\n\n');

clear all;

Nsz=2000;
Nlp=30;

Results show that GNU Octave built with Intel MKL is generally superior than FOSS counterpart. Here are the results:

Test Intel MKL runtime (second, lower is better) FOSS runtime (second, lower is better) Intel MKL speedup vs FOSS (%)
Pseudo-inverse 0.722554 0.917274 126.95%
Linear equation system 0.36448 0.191732 52.60%
Linear regression 0.224669 1.83595 817.18%
Singular value decomposition 2.53434 3.56501 140.67%
Fast fourier transform 0.0452899 0.0784928 173.31%
Bubblesort 43.3166 43.5649 100.57%

As seen above, GNU Octave linked with Intel MKL only lost once against FOSS in Linear equation system test. Most of the time, MKL won by respectable margin ranged from 100.57% in Bubblesort (which maybe just inside margin of error) to a whopping 817.18% in Linear regression. Granted, maybe Linear regression gains more of its performance from better BLAS/LAPACK implementation but we can also see in Fast fourier transform test that MKL’s FFT also beat AMD Optimized FFTW by a healthy 173.31%. Therefore, it’s safe to say that linking GNU Octave with Intel MKL generally gives a better performance compared to FOSS libraries.

Closing Words

GNU Octave is a good software with good support on FOSS libraries. Some people would be glad to just build their GNU Octave around FOSS BLAS/LAPACK/FFTW libraries. But as we can see above, by adjusting GNU Octave to take advantage of Intel Math Kernel Library (which of course not a FOSS) we can improve its performance up to 8X compared to FOSS. Hopefully this writing can help you build your own Octave with Intel MKL linking and get the most of your machine performance.