Skip to content

amergin/neo4j-import

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INSTALLATION

Dependencies:

RUNNING THE SCRIPT

  1. Modify the batch_import.config to your needs
  2. Run by issuing ./batchimporter.py batch_import.config
  3. Once finished, copy the resulting db directory to you neo4j data directory

EXAMPLE RUN

bash-4.1$ ./batchimporter.py batch_import.config
Data import started at 2012-09-21 20:48:06.795819
Creating MySQL dumps
Creating MySQL dumps for labelname 'cbm_pc_quantrev_0706'
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/cbm_pc_quantrev_0706_nodes.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/cbm_pc_quantrev_0706_edges.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/cbm_pc_quantrev_0706_patients.tsv exists, skipping.
Creating MySQL dumps for labelname 'crc_31july'
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_31july_nodes.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_31july_edges.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_31july_patients.tsv exists, skipping.
Creating MySQL dumps for labelname 'brca_public_0613'
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/brca_public_0613_nodes.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/brca_public_0613_edges.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/brca_public_0613_patients.tsv exists, skipping.
Creating MySQL dumps for labelname 'crc_noroi_1807'
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_noroi_1807_nodes.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_noroi_1807_edges.tsv exists, skipping.
MySQL dump file at /home/lineo4j/import/gitrepo/dumps/crc_noroi_1807_patients.tsv exists, skipping.
Finished creating MySQL dumps
Creating the Patient Barcode TSV files
Finished creating Patient Barcode TSV files
Starting to create node files.
Node files created.
Starting to create edge files
Creating edge files for dataset 'cbm_pc_quantrev_0706'
Creating edge files for dataset 'crc_31july'
Creating edge files for dataset 'brca_public_0613'
Creating edge files for dataset 'crc_noroi_1807'
Edge files created.
Creating Neo4j Database
Using Existing Configuration File
..
Importing 204099 Nodes took 38 seconds 
.................................................................................................... 87686 ms for 10000000
.................................................................................................... 276035 ms for 10000000
...........
Importing 21134392 Relationships took 412 seconds 

Importing 29595 Done inserting into cbm_pc_quantrev_0706_i_n_GEXP Index took 10 seconds 

Importing 5000 Done inserting into cbm_pc_quantrev_0706_i_n_METH Index took 5 seconds 

Importing 18 Done inserting into cbm_pc_quantrev_0706_i_n_CLIN Index took 0 seconds 

Importing 19872 Done inserting into cbm_pc_quantrev_0706_i_n_CNVR Index took 1 seconds 

Importing 0 Done inserting into cbm_pc_quantrev_0706_i_n_RPPA Index took 0 seconds 

Importing 0 Done inserting into cbm_pc_quantrev_0706_i_n_GNAB Index took 0 seconds 

Importing 0 Done inserting into cbm_pc_quantrev_0706_i_n_MIRN Index took 0 seconds 

Importing 0 Done inserting into cbm_pc_quantrev_0706_i_n_SAMP Index took 0 seconds 

Importing 15399 Done inserting into crc_31july_i_n_GEXP Index took 4 seconds 

Importing 21166 Done inserting into crc_31july_i_n_METH Index took 1 seconds 

Importing 73 Done inserting into crc_31july_i_n_CLIN Index took 0 seconds 

Importing 9674 Done inserting into crc_31july_i_n_CNVR Index took 0 seconds 

Importing 171 Done inserting into crc_31july_i_n_RPPA Index took 0 seconds 

Importing 17256 Done inserting into crc_31july_i_n_GNAB Index took 1 seconds 

Importing 519 Done inserting into crc_31july_i_n_MIRN Index took 0 seconds 

Importing 164 Done inserting into crc_31july_i_n_SAMP Index took 0 seconds 

Importing 5485 Done inserting into brca_public_0613_i_n_GEXP Index took 0 seconds 

Importing 4982 Done inserting into brca_public_0613_i_n_METH Index took 0 seconds 

Importing 61 Done inserting into brca_public_0613_i_n_CLIN Index took 0 seconds 

Importing 3831 Done inserting into brca_public_0613_i_n_CNVR Index took 0 seconds 

Importing 165 Done inserting into brca_public_0613_i_n_RPPA Index took 0 seconds 

Importing 5482 Done inserting into brca_public_0613_i_n_GNAB Index took 0 seconds 

Importing 605 Done inserting into brca_public_0613_i_n_MIRN Index took 0 seconds 

Importing 315 Done inserting into brca_public_0613_i_n_SAMP Index took 0 seconds 

Importing 15399 Done inserting into crc_noroi_1807_i_n_GEXP Index took 1 seconds 

Importing 21166 Done inserting into crc_noroi_1807_i_n_METH Index took 1 seconds 

Importing 73 Done inserting into crc_noroi_1807_i_n_CLIN Index took 0 seconds 

Importing 9518 Done inserting into crc_noroi_1807_i_n_CNVR Index took 0 seconds 

Importing 171 Done inserting into crc_noroi_1807_i_n_RPPA Index took 0 seconds 

Importing 17256 Done inserting into crc_noroi_1807_i_n_GNAB Index took 0 seconds 

Importing 519 Done inserting into crc_noroi_1807_i_n_MIRN Index took 0 seconds 

Importing 164 Done inserting into crc_noroi_1807_i_n_SAMP Index took 0 seconds 

Total import time: 489 seconds 
Finished creating Neo4j Database
Creating info nodes
Meta node created, id 204100
Infonodes created
Data import ended at 2012-09-21 21:06:23.105750
Import time 1096.31 seconds, 4 datasets

bash-4.1$ du -csh targetdb/
2.9G    targetdb/
2.9G    total

OTHER NOTES

Copy the resulting directory, run in server mode and ensure the graph structure is loaded to high-level cache (including the property linked lists for each element). That is, set cache type to strong (http://docs.neo4j.org/chunked/stable/configuration-caches.html), set a high enough heap size and run cypher queries

START a=node(*) WHERE a.label? <> "A" RETURN COUNT(a);
START a=relationship(*) WHERE a.pvalue? < 0 RETURN COUNT(a);

prior to using the graph demo. This will effectively touch every element in the database and their properties, ensuring that all elements and their properties are placed in the high-level cache.

About

Neo4j importing scripts for RF-ACE MySQL datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages