Skip to content

gtessa/hadoop-summary

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

# Copyright 2012 Raj Vishwanathan (rajvish@stoser.com)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

INSTALLATION
	Copy summary.py and tasks.py to a directory of choice.

OVERVIEw
 Provides the summary of a hadoop job based on the default hadoop log.

USAGE:
  summary [-v| --verbose] -j | --job  log-file 

DESCRIPTION
	summary provides a succinct or verbose analyis of a hadoop job based on 
	the log file. The output is a CSV file that can be used by other programs 
	for deeper analysis.
	
	summary is not clever at all. If the log file suits it, it delivers
	reasonable results. If the log file is not suitable it could either survive
	or die a horrible death. 

	More work is needed to make summary a more robust character.

	The log file is assumed to be the default hadoop log. It has not been tested
	with other formats.

OPTIONS
	-j, --job 
		Job log file. Mandatory.
	-v, --verbose 
		Provide analysis of both job and task logs.
	-h, --help	
		Provide a succinct and unhelpful message.

PYTHON
	It has been tested with Python 2.7.2+

About

Summary report using the hadoop log files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%