Impala Workload Analyzer

Parse thrift-formatted profiles and analyze characteristics of workloads

USAGE

bash driver.sh tag inputFile|inputDir outputDir >>analysis.log 2>&1
Make sure the inputDir only contains profile files
Run all the code and generate graphs
Instead of running the driver script, users can run each component individually (see driver.sh)

DIRECTORY STRUCTURE

.
|— README.md
|— data: example profiles
|— code
    |— driver.sh
    |— java
        |— src/com/cloudera/impala/analysis/QueryAnalyzer.java: parse sql
        |— src/com/cloudera/impala/analysis/QueryStats.java: sql statistics
        |— src/com/cloudera/impala/analysis/Parameterizer.java: parameterize sql
    |— python
        |— RuntimeProfile.thrift: definition of thrift-formatted profiles
        |— analyze_profiles.py: analyze one profile file
        |— profile_analyzer.py: analyze one query profile
        |— stats.py: generate graphs for a workload
        |— joins.py: check the quality of joins
        |— aggs.py: check the quality of pre aggs
        |— plots.py: helper file to draw graphs
        |— delete_queries_with_tag.py: delete queries with a particular tag
        |— clustering.py: group queries into clusters

RUN ON vd0204.halxg.cloudera.com

1. Make sure mongodb is running
2. Activate virtualenv (source venv/bin/activate)
3. Clear database (use impala; db.dropDatabase()) or delete queries with a particular tag (python delete_queries_with_tag.py tag)
4. Run driver.sh

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
code		code
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Impala Workload Analyzer

USAGE

DIRECTORY STRUCTURE

RUN ON vd0204.halxg.cloudera.com

About

Releases

Packages

Languages

jjyao/impala-workload-analyzer

Folders and files

Latest commit

History

Repository files navigation

Impala Workload Analyzer

USAGE

DIRECTORY STRUCTURE

RUN ON vd0204.halxg.cloudera.com

About

Resources

Stars

Watchers

Forks

Languages