Skip to content

fuzzy-id/midas

Repository files navigation

Midas

This is a collection of scripts used to correlate rankings of the one million most visited sites per day (Alexa Top1M files) and information from CrunchBase. The final goal is to make a prediction whether an enterprise will receive a funding of the type angel, seed or a with a lead time of 90 days.

The code in this repository covers two steps:

  1. Fetching the data from CrunchBase, pre-processing the whole data set (including the Alexa sites) and generate samples.
  2. Use the samples to generate a set of features and run C5.0 in order to build classifiers from these.

The first step is summarized entirely in the shell script commands_sample_generation.sh.

For the second step a program is needed which is not publicly available.

About

Mining Alexa Top1M Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published