Part I: Neo4j vs Oracle Performance Test
Introduction
Let's take typical graphDB task to finding deep=N friends of specified profile in social network and compare performance of this finding on Oracle and on Neo4j at the same Hardware environment.
Hardware
Intel i7 2.3 GHz, SSD; VM (macOS Mojave + Parallels 12): 4 cores, 8 GB RAM
ORACLE Part
Oracle 19C Ent. Edition out of the box with no tuning and increasing SGA, PGA, etc:
as sys:
SHOW SGA
Total System Global Area 2516582192 bytes
Fixed Size 9137968 bytes
Variable Size 369098752 bytes
Database Buffers 2130706432 bytes
Redo Buffers 7639040 bytes
select value/1024/1024/1024 from V$PGASTAT where name = 'aggregate PGA target parameter';
1.15625
ORA TEMPORARY SEGMENT = 40 GB
as scott:
CREATE TABLE soc_pokec_profiles ( user_id NUMBER, s_public CHAR(1), completion_percentage NUMBER, gender CHAR(4), region VARCHAR2(4000), last_login DATE, registration DATE, age NUMBER, body VARCHAR2(4000), working_in_field VARCHAR2(4000), spoken_languages VARCHAR2(4000), hobbies VARCHAR2(4000), enjoy_good_food VARCHAR2(4000), pets VARCHAR2(4000), body_type VARCHAR2(4000), my_eyesight VARCHAR2(4000), eye_color VARCHAR2(4000), hair_color VARCHAR2(4000), hair_type VARCHAR2(4000), completed_level_of_education VARCHAR2(4000), favourite_color VARCHAR2(4000), relation_to_smoking VARCHAR2(4000), relation_to_alcohol VARCHAR2(4000), sign_in_zodiac VARCHAR2(4000), CONSTRAINT spp$user_id PRIMARY KEY (user_id) ); CREATE TABLE soc_pokec_relationships ( user_id NUMBER, friend_id NUMBER, CONSTRAINT spr$user_id$friend_id PRIMARY KEY (user_id,friend_id) );
Load data into Oracle
from https://snap.stanford.edu/data/soc-pokec.html by SQL*Loader
Profiles
ora_load_profiles.ctl:
load data replace into table soc_pokec_profiles fields terminated by '\t' ( user_id, s_public, completion_percentage, gender, region char(4000), last_login timestamp "YYYY-MM-DD HH24:MI:SSXFF", registration timestamp "YYYY-MM-DD HH24:MI:SSXFF", age, body char(4000), working_in_field char(4000), spoken_languages char(4000), hobbies char(4000), enjoy_good_food char(4000), pets char(4000), body_type char(4000), my_eyesight char(4000), eye_color char(4000), hair_color char(4000), hair_type char(4000), completed_level_of_education char(4000), favourite_color char(4000), relation_to_smoking char(4000), relation_to_alcohol char(4000), sign_in_zodiac char(4000) )
[root@graph ~]# $ORACLE_HOME/bin/sqlldr userid=scott/tiger control=ora_load_profiles.ctl log=./p.log bad=./p.bad data=soc-pokec-profiles.txt direct=true multithreading=true errors=1000
Table SOC_POKEC_PROFILES:
1632577 Rows successfully loaded.
226 Rows not loaded due to data errors.
Relations
ora_load_rel.ctl:
load data replace into table soc_pokec_relationships fields terminated by '\t' ( user_id, friend_id )
[root@graph ~]# $ORACLE_HOME/bin/sqlldr userid=scott/tiger control=ora_load_rel.ctl log=./r.log bad=./r.bad data=soc-pokec-relationships.txt direct=true multithreading=true
Table SOC_POKEC_RELATIONSHIPS:
30618640 Rows successfully loaded.
Delete unused relationships (due not loaded some bad profiles)
DELETE FROM ( SELECT * FROM soc_pokec_relationships WHERE user_id NOT IN (select user_id from soc_pokec_profiles) OR friend_id NOT IN (select user_id from soc_pokec_profiles) ) ;
Compute statistics
analyze table SOC_POKEC_RELATIONSHIPS compute statistics;
Export profiles & relationships into CSV-files
as sys:
CREATE DIRECTORY CSV_DIR AS '/gdbms'; GRANT READ ON DIRECTORY CSV_DIR TO SCOTT; GRANT WRITE ON DIRECTORY CSV_DIR TO SCOTT;
as scott:
DECLARE OUTPUT UTL_FILE.FILE_TYPE; FILENAME VARCHAR2(100); BEGIN FILENAME := 'soc_profile.csv'; OUTPUT := UTL_FILE.FOPEN ( 'CSV_DIR', FILENAME, 'W' ); UTL_FILE.PUT_LINE(OUTPUT, 'user_id:ID'); FOR I IN (SELECT * FROM soc_pokec_profiles) LOOP UTL_FILE.PUT_LINE ( OUTPUT, I.user_id ); END LOOP; UTL_FILE.FCLOSE(OUTPUT); END;
DECLARE OUTPUT UTL_FILE.FILE_TYPE; FILENAME VARCHAR2(100); BEGIN FILENAME := 'soc_relationships.csv'; OUTPUT := UTL_FILE.FOPEN ( 'CSV_DIR', FILENAME, 'W' ); UTL_FILE.PUT_LINE(OUTPUT, ':START_ID,:END_ID,'); FOR I IN (SELECT * FROM soc_pokec_relationships) LOOP UTL_FILE.PUT_LINE ( OUTPUT, I.user_id || ',' || I.friend_id ); END LOOP; UTL_FILE.FCLOSE(OUTPUT); END;
Neo4j Part
[root@graph ~]# nano /etc/neo4j/neo4j.conf
# The name of the database to mount
#dbms.active_database=graph.db
dbms.active_database=perf.db
# Paths of directories in the installation.
dbms.directories.data=/gdbms/neo4j/data
dbms.directories.logs=/gdbms/neo4j
#dbms.memory.heap.max_size=512m
dbms.memory.heap.max_size=3584m
#dbms.memory.pagecache.size=10g
dbms.memory.pagecache.size=4g
Import CSV-files into Neo4j
[root@graph ~]# neo4j-admin import --database=perf.db --nodes:Profile=./soc_profile.csv --relationships:KNOWS=./soc_relationships.csv --id-type=INTEGER [root@graph ~]# chown -R neo4j:neo4j /gdbms/neo4j/data/databases/perf.db
Run Part
user_id = 1664
Deep | Oracle time (rec_cnt) | Neo4j time (rec_cnt) |
---|---|---|
1 | 00:00:00.003 (169) | 485 ms (169) |
2 | 00:00:00.014 (5460) | 497 ms (5460) |
3 | 00:00:00.057 (120809) | 864 ms (120809) |
4 | 00:00:02.469 (800479) | 5676 ms (800479) |
5 | 00:00:19.443 (1369889) | 27289 ms (1369889) |
6 | 00:00:38.180 (1490064) | 118870 ms (1490064) |
7 | 00:02:55.999 (1502711) | Not Finished in 1 Hour: CPU 156.5% Time: 95:17.95 java |
7' | 00:00:49.539 (1502711) after tuning: set memory management to manual and memory_target to 6.4 GB * | Not Finished in 1 Hour: CPU 156.5% Time: 95:17.95 java |
8 | 00:02:29.073 (1503884) | --- |
9 | 00:02:46.219 (1504102) | --- |
10 | 00:02:50.576 (1504110) | --- |
11 | 00:02:57.122 (1504111) | --- |
12 | 00:02:52.708 (1504111) |
* -
SHOW SGA
Total System Global Area 6777994992 bytes
Fixed Size 9150192 bytes
Variable Size 4630511616 bytes
Database Buffers 2130706432 bytes
Redo Buffers 7626752 bytes
select value/1024/1024/1024 from V$PGASTAT where name = 'aggregate PGA target parameter';
3.96875
ORA TEMPORARY SEGMENT = 40 GB
SQL QUERY for Deep=7':
SELECT count(*) FROM ( SELECT /*+ PARALLEL(4) */ DISTINCT friend_id FROM soc_pokec_relationships r WHERE user_id IN ( SELECT /*+ PARALLEL(4) */ DISTINCT friend_id FROM soc_pokec_relationships r WHERE user_id IN ( SELECT /*+ PARALLEL(4) */ DISTINCT friend_id FROM soc_pokec_relationships r WHERE user_id IN ( SELECT /*+ PARALLEL(4) */ DISTINCT friend_id FROM soc_pokec_relationships r WHERE user_id IN ( SELECT /*+ PARALLEL(4) */ friend_id FROM soc_pokec_relationships r WHERE user_id IN ( SELECT /*+ PARALLEL(4) */ friend_id FROM soc_pokec_relationships r WHERE r.user_id IN ( SELECT /*+ PARALLEL(4) */ friend_id FROM soc_pokec_relationships r WHERE r.user_id = 1664 ) ) ) ) ) ) ) ;
CYPHER QUERY (Deep=7):
MATCH (p:Profile{user_id:1664})-[:KNOWS*7]->(pp:Profile) RETURN count(DISTINCT pp.user_id);
Conclusions
1) Oracle always faster in 2,3....and so on times than Neo4j in typical for Neo4j task and dataset.
2) This article How much faster is a graph database, really? raised doubts. And the article Benchmarketing – Neo4j and MySQL confirmed these doubts.
3) In 12 step of deep friends count is stayed CONSTANT. And the time of query stayed the CONSTANT.
4) This article NEO4J IS FASTER THAN MYSQL IN PERFORMING RECURSIVE QUERY of course is very interesting but if we need to write java code for tuning every typical GraphDB case may be we can write this java (pl/sql, c, asm...) code inside Oracle -:) And how we can see in preceding point the time of execution of deep=12,13,...25 in ORACLE will be CONSTANT.
See also: Part II: Neo4j vs Oracle Performance Test
#neo4j #oracle #performance
Bonus:
In his article author says: "Also remember that this is a very tiny graph. 1 Million nodes and 10 Million relationships, heading 4 levels deep. Try this with 100 Million nodes, or going 25 levels deep. MySQL, Postgres, Oracle Exadata, all of them will bust and Neo4j will keep on going."
Really? OK. Let's take deep=25 for Oracle...
The time on my MacBook is: 00:03:03.106. (of course records count is 1504111)
Q.E.D. (quod erat demonstrandum)
And I suspect that Oracle Exadata has a little more hardware resources than my laplop...:)
SQL Query for deep=25 (I guess it may be optimized with 'connect by' but for simplity it seems as +1 select for every deep level):