Synthetic Performance Test: GCC vs Intel ICC vs LuaJIT vs LuaJIT+FFI vs JavaScript
Hardware
Intel i7 2.3 GHz 4 cores (8 logical), 16GB RAM, SSD, macOS Mojave + Parallels 12;
VM Oracle Linux 7.6 configured with 4 cores, 8 GB RAM
C Test File
#include stdio.h
#define N 4000
#define S 1000
struct
t {
double
a, b, f;
};
int
main (
int
argc,
char
**argv) {
int
i, j;
struct
t t[N];
for
(i=0; i
t[i].a = 0;
t[i].b = 1;
t[i].f = i * 0.25;
};
for
(j=0; j
for
(i=0; i
t[i].a += t[i].b * t[i].f;
t[i].b -= t[i].a * t[i].f;
}
printf
(
"%.6f\n"
, t[1].a);
}
return
0;
}
GCC (4.8.5)
GCC (4.8.5)
-ftree-vectorizegcc lua_perf.c -o lua_perf -O3 -Wall -march=native -ftree-parallelize-loops=4 -floop-parallelize-all
time ./lua_perf > /dev/null
real 0m5.604s
user 0m22.263s
sys 0m0.118s
top
404 root 20 0 36468 4624 1356 R 400.0(%CPU) 0.1(%MEM) 0:09.08 lua_perf
GCC8 (8.3.1 from devtoolset-8)
real 0m5.695s
user 0m22.632s
sys 0m0.124s
ICC (19.0.4.235)
ICC (19.0.4.235)
/opt/intel/system_studio_2019/bin/icc lua_perf.c -O2 -o lua_perf_icc -no-prec-div -ipo -xSSE4.2 -parallel
time ./lua_perf_icc > /dev/null
real 0m5.322s
user 0m21.186s
sys 0m0.074s
top
14344 root 20 0 247848 6588 3100 R 400.0 (%CPU) 0.1(%MEM) 0:14.55 lua_perf_icc
Lua Test File
Lua Test File
local N = 4000
local S = 1000
local t = {}
for i = 0, N do
t[i] = { a = 0, b = 1, f = i * 0.25 }
end
for j = 0, S-1 do
for i = 0, N-1 do
t[i].a = t[i].a + t[i].b * t[i].f
t[i].b = t[i].b - t[i].a * t[i].f
end
print(string.format("%.6f", t[1].a))
end
LuaJIT
LuaJIT
time /usr/local/openresty/luajit/bin/luajit lua_perf.lua > /dev/null
real 3m4.680s
user 3m4.612s
sys 0m0.042s
15581 root 20 0 38572 28360 2000 R 100.0(%CPU) 0.4(%MEM) 0:04.08 luajit
Lua+FFI Test File
Lua+FFI Test File
--collectgarbage('setpause', 2000)
local ffi = require("ffi")
ffi.cdef[[
typedef struct { double a, b, f; } table_elem;
]]
local N = 140000
local S = 110000
local t = ffi.new("table_elem[?]", N)
for i = 0, N-1 do
t[i].a = 0.0
t[i].b = 1.0
t[i].f = i * 0.25
end
for j = 0, S-1 do
for i = 0, N-1 do
t[i].a = t[i].a + t[i].b * t[i].f
t[i].b = t[i].b - t[i].a * t[i].f
end
print(string.format("%.6f", t[1].a))
end
LuaJIT+FFI
LuaJIT+FFI
time /usr/local/openresty/luajit/bin/luajit lua_perf_ffi.lua > /dev/null
real 0m22.603s
user 0m22.589s
sys 0m0.012s
15625 root 20 0 17920 7508 1968 R 100.0(%CPU) 0.1(%MEM) 0:08.00 luajit
JavaScript Test File
JavaScript Test File
class
lua_perf {
public
double
a, b, f;
static
final
int
N=
4000
;
static
final
int
S=
1000
;
public
static
void
main (String[] argv) {
int
i, j;
lua_perf[] t =
new
lua_perf[N];
for
(i=
0
; i
t[i] =
new
lua_perf();
t[i].a =
0
;
t[i].b =
1
;
t[i].f = i *
0.25
;
};
for
(j=
0
; j
for
(i=
0
; i
t[i].a += t[i].b * t[i].f;
t[i].b -= t[i].a * t[i].f;
}
System.out.println(t[
1
].a);
}
}
}
JavaScript (without any optimize in JVM)
JavaScript (without any optimize in JVM)
java version "1.8.0_201" Java(TM) SE Runtime Environment (build 1.8.0_201-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
/u01/app/oracle/product/19.0.0/dbhome_1/jdk/bin/javac lua_perf.java
time /u01/app/oracle/product/19.0.0/dbhome_1/jdk/bin/java lua_perf > /dev/null
real 0m33.052s
user 0m32.849s
sys 0m0.425s
15516 root 20 0 4452640 37940 16056 S 100.7(%CPU) 0.6(%MEM) 0:07.33 java
Files size
Files size
-rwxr-xr-x. 1 root root 8.6K Aug 9 18:13 lua_perf
-rw-r--r--. 1 root root 572 Aug 9 17:43 lua_perf.c
-rw-r--r--. 1 root root 822 Aug 9 19:04 lua_perf.class
-rw-r--r--. 1 root root 438 Aug 9 17:25 lua_perf_ffi.lua
-rwxr-xr-x. 1 root root 8.3K Aug 9 18:26 lua_perf_gcc8
-rwxr-xr-x. 1 root root 28K Aug 9 18:28 lua_perf_icc
-rw-r--r--. 1 root root 777 Aug 9 19:04 lua_perf.java
-rw-r--r--. 1 root root 380 Aug 9 19:03 lua_perf.lua
Conclusions
Conclusions
GCC and ICC have similar performance; Intel is a little bit faster(1.07x) in this particular test. LuaJIT_FFI has C-like performance but needs parallelism for speed of C programs compiled with parallel option. LuaJIT (NOT FFI) has not bad performance for script language. JavaScript has good performance, but loves many RAM as usually. Pure Lua (NOT JIT) has not been considered as it will be a priory too slow in this test.