Skip to content

yuan776/ethz-data-mining

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining

Important URLs

Usage

# Start and stop Hadoop
/usr/local/Cellar/hadoop121/1.2.1/bin/start-all.sh
/usr/local/Cellar/hadoop121/1.2.1/bin/stop-all.sh

# Hadoop dir
/usr/local/Cellar/hadoop121/1.2.1

# Copy data to HDFS
hadoop dfs -copyFromLocal /Users/lukas/data-mining/example/input /user/hduser/example

# Run the job
# Mapper and reducer paths are local, input and output paths are HDFS
hadoop jar ~/.bin/hadoop-streaming-1.2.1.jar \
-mapper /Users/lukas/data-mining/example/mapper.py \
-reducer /Users/lukas/data-mining/example/reducer.py \
-input "/user/hduser/example/*" \
-output /user/hduser/example-output

# List and output the results
hadoop dfs -ls /user/hduser/example-output
hadoop dfs -cat /user/hduser/example-output/part-00000

# Copy data to local dir
hadoop dfs -copyToLocal /user/hduser/example-output /Users/lukas/data-mining/example/output

# Delete dir
hadoop dfs -rmr /user/hduser/example-output

About

ETH Data Mining Class

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.6%
  • Shell 1.3%
  • Other 0.1%