wellbook

The wellbook concept is about a single view of an oil well and its history- something akin to a "Facebook Wall" for oil wells.

This repo is built from data collected and made available by the North Dakota Industrial Commission.

I used the wellindex.csv file to obtain a list of well file numbers (file_no), scraped their respective Production, Injection, Scout Ticket web pages, any available LAS format well logfiles, and loaded them into HDFS (/user/dev/wellbook/) for analysis.

To avoid the HDFS small files problem I used the Apache Mahout seqdirectory tool for combining my textfiles into SequenceFiles: the keys are the filenames and the values are the contents of each textfile.

Then I used a combination of Hive queries and the pyquery Python library for parsing relevant fields out of the raw HTML pages.

Tables:
wellbook.wells -- well metadata including geolocation and owner
wellbook.well_surveys -- borehole curve
wellbook.production -- how much oil, gas, and water was produced for each well on a monthly basis
wellbook.auctions -- how much was paid for each parcel of land at auction
wellbook.injections -- how much fluid and gas was injected into each well (for enhanced oil recovery and disposal purposes)
wellbook.log_metadata -- metadata for each LAS well log file
wellbook.log_readings -- sensor readings for each depth step in all LAS well log files
wellbook.log_key -- map of log mnemonics to their descriptions
wellbook.formations -- manually annotated map of well depths to rock formations
wellbook.formations_key -- Descriptions of rock formations
wellbook.water_sites -- metadata for water quality monitoring stations in North Dakota

Setup:

git clone https://github.com/randerzander/wellbook

#Prereqs
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo yum groupinstall -y 'development tools'
sudo yum install -y apache-maven mahout
#for python libs
sudo yum install -y python-devel libxslt-devel blas-devel lapack-devel gcc-gfortran
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin
echo export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin >> ~/.bashrc

#Download and install virtualenv
wget https://bootstrap.pypa.io/ez_setup.py
sudo python ez_setup.py
sudo easy_install pip
sudo pip install virtualenv

#Create a relocatable Python virtualenv
virtualenv ~/wellbook/pyenv
source ~/wellbook/pyenv/bin/activate
pip install pyquery numpy scipy scikit-learn
cp ~/wellbook/etl/lib/recordhelper.py ~/wellbook/pyenv/lib/python2.6/site-packages/
deactivate
virtualenv --relocatable ~/wellbook/pyenv

function mvn_package(){
  git clone $1
  mv $2 $3/
  cd $3/$2
  mvn package
}
#Download and build the custom Hive InputFormat
mvn_package https://github.com/randerzander/SequenceFileKeyValueInputFormat SequenceFileKeyValueInputFormat ~/wellbook

#Download and build necessary Hive UDFs
mkdir ~/wellbook/udfs
mvn_package https://github.com/Esri/spatial-framework-for-hadoop spatial-framework-for-hadoop ~/wellbook/udfs
mvn_package https://github.com/randerzander/CurveUDFs CurveUDFs ~/wellbook/udfs

#Download and build necessary Hive SerDes
mkdir ~/wellbook/serdes
mvn_package https://github.com/ogrodnek/csv-serde csv-serde ~/wellbook/serdes

cd ~/
#Sets up HDFS folder structure
sh ~/wellbook/scripts/hdfs_setup.sh
#Sets up Hive tables
sh ~/wellbook/scripts/hive_setup.sh

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
ddl		ddl
etl		etl
queries		queries
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddl

ddl

etl

etl

queries

queries

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

wellbook

About

Releases

Packages

Languages

License

whiz/wellbook

Folders and files

Latest commit

History

Repository files navigation

wellbook

About

Resources

License

Stars

Watchers

Forks

Languages