homme.io
Clean.Precise.Quick.
..
PAX ROMANA
Фотография
Философия
Искусство
История
C/C++
DBMS
Oracle
Linux
IT

Infinitum.Aeterna
Genre.Photography
2020.Арктика
2020.Greece
2019.Egypt
2019.Italy
2019.Kulikovo Field
2019.Baikal
2018.Крым
2018.Elbrus
2015.Egypt

Part I: Neo4j vs Oracle Performance Test

Introduction

Let's take typical graphDB task to finding deep=N friends of specified profile in social network and compare performance of this finding on Oracle and on Neo4j at the same Hardware environment.

Hardware

Intel i7 2.3 GHz, SSD; VM (macOS Mojave + Parallels 12): 4 cores, 8 GB RAM

ORACLE Part

Oracle 19C Ent. Edition out of the box with no tuning and increasing SGA, PGA, etc:

as sys:

SHOW SGA

Total System Global Area   2516582192 bytes      
Fixed Size                           9137968 bytes      
Variable Size                       369098752 bytes      
Database Buffers                2130706432 bytes      
Redo Buffers                       7639040 bytes

select value/1024/1024/1024 from V$PGASTAT where name = 'aggregate PGA target parameter';

1.15625

ORA TEMPORARY SEGMENT = 40 GB

as scott:

CREATE TABLE soc_pokec_profiles (
    user_id NUMBER,
    s_public CHAR(1),
    completion_percentage NUMBER,
    gender CHAR(4),
    region VARCHAR2(4000),
    last_login DATE,
    registration DATE,
    age NUMBER,
    body VARCHAR2(4000),
    working_in_field VARCHAR2(4000),
    spoken_languages VARCHAR2(4000),
    hobbies VARCHAR2(4000),
    enjoy_good_food VARCHAR2(4000),
    pets VARCHAR2(4000),
    body_type VARCHAR2(4000),
    my_eyesight VARCHAR2(4000),
    eye_color VARCHAR2(4000),
    hair_color VARCHAR2(4000),
    hair_type VARCHAR2(4000),
    completed_level_of_education VARCHAR2(4000),
    favourite_color VARCHAR2(4000),
    relation_to_smoking VARCHAR2(4000),
    relation_to_alcohol VARCHAR2(4000),
    sign_in_zodiac VARCHAR2(4000),
    CONSTRAINT spp$user_id PRIMARY KEY (user_id)
);

CREATE TABLE soc_pokec_relationships (
    user_id                        NUMBER,
    friend_id                      NUMBER,
    CONSTRAINT spr$user_id$friend_id PRIMARY KEY (user_id,friend_id)
);

Load data into Oracle

from https://snap.stanford.edu/data/soc-pokec.html by SQL*Loader

Profiles

ora_load_profiles.ctl:

load data
replace
into table soc_pokec_profiles
fields terminated by '\t'
(
    user_id,
    s_public,
    completion_percentage,
    gender,
    region char(4000),
    last_login timestamp "YYYY-MM-DD HH24:MI:SSXFF",
    registration timestamp "YYYY-MM-DD HH24:MI:SSXFF",
    age,
    body char(4000),
    working_in_field char(4000),
    spoken_languages char(4000),
    hobbies char(4000),
    enjoy_good_food char(4000),
    pets char(4000),
    body_type char(4000),
    my_eyesight char(4000),
    eye_color char(4000),
    hair_color char(4000),
    hair_type char(4000),
    completed_level_of_education char(4000),
    favourite_color char(4000),
    relation_to_smoking char(4000),
    relation_to_alcohol char(4000),
    sign_in_zodiac char(4000)
)
[root@graph ~]# $ORACLE_HOME/bin/sqlldr userid=scott/tiger control=ora_load_profiles.ctl 
log=./p.log bad=./p.bad data=soc-pokec-profiles.txt direct=true multithreading=true errors=1000

Table SOC_POKEC_PROFILES:
  1632577 Rows successfully loaded.
  226 Rows not loaded due to data errors.

Relations

ora_load_rel.ctl:

load data
replace
into table soc_pokec_relationships
fields terminated by '\t'
(
 user_id,
 friend_id
)
[root@graph ~]# $ORACLE_HOME/bin/sqlldr userid=scott/tiger control=ora_load_rel.ctl
log=./r.log bad=./r.bad data=soc-pokec-relationships.txt direct=true multithreading=true

Table SOC_POKEC_RELATIONSHIPS:
  30618640 Rows successfully loaded.
 

Delete unused relationships (due not loaded some bad profiles)

DELETE FROM (
SELECT * FROM soc_pokec_relationships
WHERE
    user_id NOT IN (select user_id from soc_pokec_profiles)
OR  friend_id NOT IN (select user_id from soc_pokec_profiles)
)
;

Compute statistics

analyze table SOC_POKEC_RELATIONSHIPS compute statistics;

Export profiles & relationships into CSV-files

as sys:

CREATE DIRECTORY CSV_DIR AS '/gdbms';
GRANT READ ON DIRECTORY CSV_DIR TO SCOTT;
GRANT WRITE ON DIRECTORY CSV_DIR TO SCOTT;


as scott:

DECLARE
    OUTPUT UTL_FILE.FILE_TYPE;
    FILENAME VARCHAR2(100);

BEGIN
    FILENAME := 'soc_profile.csv';
    OUTPUT := UTL_FILE.FOPEN ( 'CSV_DIR', FILENAME, 'W' );
    UTL_FILE.PUT_LINE(OUTPUT, 'user_id:ID');     

    FOR I IN (SELECT * FROM soc_pokec_profiles) LOOP
        UTL_FILE.PUT_LINE ( OUTPUT, I.user_id );        
    END LOOP;

    UTL_FILE.FCLOSE(OUTPUT);
END;

 

DECLARE     
    OUTPUT UTL_FILE.FILE_TYPE;
    FILENAME VARCHAR2(100);
BEGIN
    FILENAME := 'soc_relationships.csv';
    OUTPUT := UTL_FILE.FOPEN ( 'CSV_DIR', FILENAME, 'W' );
    UTL_FILE.PUT_LINE(OUTPUT, ':START_ID,:END_ID,');

    FOR I IN (SELECT * FROM soc_pokec_relationships) LOOP
        UTL_FILE.PUT_LINE ( OUTPUT, I.user_id || ',' || I.friend_id );        
    END LOOP;

    UTL_FILE.FCLOSE(OUTPUT);
END;

 

Neo4j Part

[root@graph ~]# nano /etc/neo4j/neo4j.conf

# The name of the database to mount
#dbms.active_database=graph.db
dbms.active_database=perf.db

# Paths of directories in the installation.
dbms.directories.data=/gdbms/neo4j/data
dbms.directories.logs=/gdbms/neo4j

#dbms.memory.heap.max_size=512m
dbms.memory.heap.max_size=3584m

#dbms.memory.pagecache.size=10g
dbms.memory.pagecache.size=4g

Import CSV-files into Neo4j

[root@graph ~]# neo4j-admin import --database=perf.db --nodes:Profile=./soc_profile.csv
--relationships:KNOWS=./soc_relationships.csv --id-type=INTEGER

[root@graph ~]# chown -R neo4j:neo4j /gdbms/neo4j/data/databases/perf.db

 

Run Part

user_id = 1664

Deep Oracle time (rec_cnt) Neo4j time (rec_cnt)
1 00:00:00.003 (169) 485 ms (169)
2 00:00:00.014 (5460) 497 ms (5460)
3 00:00:00.057 (120809) 864 ms (120809)
4 00:00:02.469 (800479) 5676 ms (800479)
5 00:00:19.443 (1369889) 27289 ms (1369889)
6 00:00:38.180 (1490064) 118870 ms (1490064)
7 00:02:55.999 (1502711) Not Finished in 1 Hour: CPU 156.5%  Time: 95:17.95 java
7' 00:00:49.539 (1502711) after tuning: set memory management to manual and memory_target to 6.4 GB * Not Finished in 1 Hour: CPU 156.5%  Time: 95:17.95 java
8 00:02:29.073 (1503884) ---
9 00:02:46.219 (1504102) ---
10 00:02:50.576 (1504110) ---
11 00:02:57.122 (1504111) ---
12 00:02:52.708 (1504111)  

* -

SHOW SGA

Total System Global Area   6777994992 bytes      
Fixed Size                           9150192 bytes      
Variable Size                       4630511616 bytes      
Database Buffers                2130706432 bytes      
Redo Buffers                       7626752 bytes    

select value/1024/1024/1024 from V$PGASTAT where name = 'aggregate PGA target parameter';

3.96875

ORA TEMPORARY SEGMENT = 40 GB

SQL QUERY for Deep=12:

SQL QUERY for Deep=11:

SQL QUERY for Deep=9:

SQL QUERY for Deep=8:

SQL QUERY for Deep=7:

SQL QUERY for Deep=7':

SELECT count(*) FROM (
SELECT /*+ PARALLEL(4) */
  DISTINCT friend_id
FROM soc_pokec_relationships r
WHERE user_id IN
(
SELECT /*+ PARALLEL(4) */
  DISTINCT friend_id
FROM soc_pokec_relationships r
WHERE user_id IN
(
SELECT /*+ PARALLEL(4) */
  DISTINCT friend_id
FROM soc_pokec_relationships r
WHERE user_id IN
(
SELECT /*+ PARALLEL(4) */
    DISTINCT friend_id
FROM soc_pokec_relationships r
WHERE user_id IN
(
SELECT /*+ PARALLEL(4) */
  friend_id
FROM soc_pokec_relationships r
WHERE user_id IN
(  
SELECT /*+ PARALLEL(4) */
  friend_id
FROM soc_pokec_relationships r
WHERE r.user_id IN 
(
SELECT /*+ PARALLEL(4) */
  friend_id
FROM soc_pokec_relationships r
WHERE r.user_id = 1664
)
)
)
)  
)
)
)
;

SQL QUERY for Deep=6:

SQL QUERY for Deep=5:

 

CYPHER QUERY (Deep=7):

MATCH (p:Profile{user_id:1664})-[:KNOWS*7]->(pp:Profile) RETURN count(DISTINCT pp.user_id);

 

Conclusions

1) Oracle always faster in 2,3....and so on times than Neo4j in typical for Neo4j task and dataset.

2) This article How much faster is a graph database, really? raised doubts. And the article Benchmarketing – Neo4j and MySQL confirmed these doubts.

3) In 12 step of deep friends count is stayed CONSTANT. And the time of query stayed the CONSTANT.

4) This article NEO4J IS FASTER THAN MYSQL IN PERFORMING RECURSIVE QUERY   of course is very interesting but if we need to write java code for tuning every typical GraphDB case may be we can write this java (pl/sql, c, asm...) code inside Oracle -:)  And how we can see in preceding point the time of execution of deep=12,13,...25 in ORACLE will be CONSTANT.

See also: Part II: Neo4j vs Oracle Performance Test

#neo4j #oracle #performance

Bonus:

In his article author says: "Also remember that this is a very tiny graph. 1 Million nodes and 10 Million relationships, heading 4 levels deep. Try this with 100 Million nodes, or going 25 levels deep. MySQL, Postgres, Oracle Exadata, all of them will bust and Neo4j will keep on going."

Really? OK. Let's take deep=25 for Oracle...

The time on my MacBook is: 00:03:03.106.  (of course records count is 1504111)

Q.E.D. (quod erat demonstrandum)

And I suspect that Oracle Exadata has a little more hardware resources than my laplop...:)

SQL Query for deep=25 (I guess it may be optimized with 'connect by' but for simplity it seems as +1 select for every deep level):

sdmrnv, 2019-07-08 [3.216ms]