Skip to content

Yagniksuchak/CodeParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitCProc

GitCProc is a tool to extract changes to elements in code and associate them with their surrounding functions. It takes git diff logs as input and produces a table mapping the additions and deletions of the elements to a 4-tuple (project, sha, file, function/method) specifying the commit and location in the code of changed elements. It also analyzes commit messages to estimate if the commits were bug fixing or not.

Currently, we have designed it to work with C/C++/Java, but it will likely work for languages with similar functional/method structure that use { ... } to enclose blocks.

#Required Libraries GitCProc runs on python 2.7 and requires the following libraries: -psycopg2 -nltk

If you wish to output your results to a database instead of a csv file, you will need a postgres server installed.

#Running Instructions

The tool works by parsing git log files produced with the -W (function-content ) to the tool along with a few other options specified in either util.py or a config.ini file. The util.py file specifies whether to output to csv or database, record debug information, and points to the config.ini file. The config.ini file specifies the tables in the database (currently must be created manually before) and the location of the keywords file.

The keywords file specifies what structures you wish to track. It is formatted in the following manner:
[keyword_1], [INCLUDED/EXCLUDED], [SINGLE/BLOCK]
...
[keyword_N],[INCLUDED/EXCLUDED], [SINGLE/BLOCK]

The first part is the keyword you wish to track. For instance, if you are tracking assert statements, one keyword of interest would be "assert", if you were tracking try-catch blocks, you might have "try" and "catch". The second part specifies whether the keyword is being searched for (INCLUDED) or specifically ignored (EXCLUDED). For example, if you are tracking assertions, but you wish to exclude headers, you would have a line like "assert.h,EXCLUDED,SINGLE". The last part specifies whether the keyword refers to a single line coding element or a multiple line element. A single line element might be "println", where as "for" with a BLOCK designation will track all the contents of the for loop encased by {...}.

python ghProc.py [git_log]

#Running Tests

There are currently two test scripts, which can be run as: python logChunkTest.py (Tests 2, 13, 23, 31, 36 fail) python ghLogDbTest.py (Tests 4 and 7)

The failing tests are the result of not currently supporting the tracking of deleted functions.

#Performance Updated soon...

#Upcoming Features

  • Handling of deleted functions
  • Automated download of repositories from Github and log creation
  • Better authorship recording (author + committer)
  • Output of line contents and not just counts of lines changed.
  • ...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published