603-Masters-Project-Apache-Spark-

Abstract: Online advertising is a billion dollar industry consisting of three major players- Publishers such as the New York Times, ESPN etc. which make money by displaying ads on their websites, advertisers, typically product based companies which pay to have their products displayed on the publishers page and matchmakers such as Google, Yahoo, Microsoft etc. which decide dynamically which kind of ads to display for various search and other pages and earn revenue based on how often a user clicks. Since user engagement can easily go as low as 1%, the click through rate prediction problem aims to estimate the conditional probability that a user will click on an ad based on a massive dataset of predicted features such as ad content, historical performance, user and publisher specific information wherever possible and much more. We address the problem using logistic regression and analyze the scalability and efficiency of our solution on a dataset of approximately 40 million rows of anonymized user-ad interaction data.

Spark can be used using various configurations of the environment. The parameters that can be varied are nodes, executors, cores per executor, and memory per executor. We show that performance is highly dependent on the configuration of spark. We have tried to find the optimal values of these parameters for the logistic regression.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
FeatureHashing.py		FeatureHashing.py
README.md		README.md
Results,Analysis And Conclusions.pdf		Results,Analysis And Conclusions.pdf
output		output
slurm.sh		slurm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureHashing.py

FeatureHashing.py

README.md

README.md

Results,Analysis And Conclusions.pdf

Results,Analysis And Conclusions.pdf

output

output

slurm.sh

slurm.sh

Repository files navigation

603-Masters-Project-Apache-Spark-

About

Releases

Packages

Contributors 2

Languages

danielnazareth89/603-Masters-Project-Apache-Spark-

Folders and files

Latest commit

History

Repository files navigation

603-Masters-Project-Apache-Spark-

About

Resources

Stars

Watchers

Forks

Languages