This is a collection of scripts used to correlate rankings of the one
million most visited sites per day (Alexa Top1M files) and information
from CrunchBase. The final goal is to make a prediction whether
an enterprise will receive a funding of the type angel
, seed
or
a
with a lead time of 90 days.
The code in this repository covers two steps:
- Fetching the data from CrunchBase, pre-processing the whole data set (including the Alexa sites) and generate samples.
- Use the samples to generate a set of features and run C5.0 in order to build classifiers from these.
The first step is summarized entirely in the shell script
commands_sample_generation.sh
.
For the second step a program is needed which is not publicly available.