Revisiting Process versus Product Metrics: a LargeScale Analysis

Numerous methods can build predictive models from software data. But what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)?

To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98%/44% and AUCs of 95%/54%, median values).

That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change, dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
results		results
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
A_large_scale_study_on_importance_of_process_vs_product_metrics.pdf		A_large_scale_study_on_importance_of_process_vs_product_metrics.pdf
LICENSE		LICENSE
README.md		README.md
RQ1_RQ2.sh		RQ1_RQ2.sh
RQ3.sh		RQ3.sh
RQ4.sh		RQ4.sh
RQ5.sh		RQ5.sh
RQ6.sh		RQ6.sh
RQ7.sh		RQ7.sh
RQ8.sh		RQ8.sh
Untitled.ipynb		Untitled.ipynb
pre.sh		pre.sh
projects.csv		projects.csv

Navigation Menu

License

Suvodeep90/Revisit_process_product

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages