GitHub - juju-solutions/layer-gobblin: Charm for Gobblin implemented with layers

Overview

"Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop." from the Gobblin wiki

Usage

This charm is uses the Hadoob base layer and the HDFS interface to pull its dependencies and act as a client to a Hadoop namenode. Here is how to deploy the Hadoop infrastructure:

juju quickstart apache-processing-mapreduce

Deploy the Gobblin charm and relate it to tha neme node:

juju deploy gobblin
juju add-relation gobblin plugin

Testing the deployment

Smoke test Gobblin

From the Gobblin unit, start the wikipedia ingestion demo job as the gobblin user:

juju ssh gobblin/0
cd /tmp
sudo su gobblin -c "gobblin-mapreduce.sh --conf wikipedia.pull --jars /usr/lib/gobblin/lib/gobblin-example.jar"

The output will be in hdfs under /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/<Your_Job_Id> . You can set the output directory through the --workdir flag.

List and get the job output file(s) in avro format.

hdfs dfs -ls /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/<Your_Job_Id>
hdfs dfs -get /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/<Your_Job_Id>/<Path_To_Output>/<Output.avro>

Transform to JSON.

curl -O http://central.maven.org/maven2/org/apache/avro/avro-tools/1.7.7/avro-tools-1.7.7.jar
java -jar avro-tools-1.7.7.jar tojson --pretty <Output.avro> > output.json

Contact Information

bigdata@lists.ubuntu.com

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
lib/charms/layer		lib/charms/layer
reactive		reactive
tests		tests
README.md		README.md
copyright		copyright
icon.svg		icon.svg
layer.yaml		layer.yaml
metadata.yaml		metadata.yaml
resources.yaml		resources.yaml
setup.cfg		setup.cfg
wheelhouse.txt		wheelhouse.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/charms/layer

lib/charms/layer

reactive

reactive

tests

tests

README.md

README.md

copyright

copyright

icon.svg

icon.svg

layer.yaml

layer.yaml

metadata.yaml

metadata.yaml

resources.yaml

resources.yaml

setup.cfg

setup.cfg

wheelhouse.txt

wheelhouse.txt

Repository files navigation

Overview

Usage

Testing the deployment

Smoke test Gobblin

Contact Information

Help

About

Releases

Packages

Contributors 4

Languages

juju-solutions/layer-gobblin

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

Testing the deployment

Smoke test Gobblin

Contact Information

Help

About

Resources

Stars

Watchers

Forks

Languages