Skip to content
This repository has been archived by the owner on Jan 27, 2024. It is now read-only.

floydwch/the-most-influential-developers-on-github

Repository files navigation

Analytics

The Most Influential Developers on Github -- Github Data Challenge 2014

There are many developers on Github, following influential developers is highly beneficial because they usually spread promising repositories. You might agree the influential developers have the powers to promote repositories on Github by starring, their followers may star successively. This survey employed the well-known PageRank algorithm, the data of watching events from the GitHub Archive and users' following relationships from the Github API to mine the most influential developers on Github.

Disclaimer

The result is based on limited data (2014/1/1 ~ 2014/8/26) and not on behalf of Github. The rank might be changed in case the collected data increased.

The Result

Data Collection

The watching events data were collected from the GitHub Archive from 2014/1/1 to 2014/8/26, the repository's name, the actor's name and the event issued time were extracted respectively. The users' following relationships were collected from the Github API. To collect the data, issuing python task_grab_watch_events. Please make sure the MongoDB has already started, this task will create a database named github.

Github API User Login

Since the task consumes the Github API, please add robots' login names and passwords respectively in the config.py under the same directory.

Build Graphs

To build graphs, please make sure the watch events have already collected to MongoDB and issue python task_gen_events_graphs. Every repository's watching event can be represented a 3-tuple vertex likes (event's created time, repository's name, actor's name), each vertex has directed edges with its following users' watching events formed vertices who are also stargazers of the repository but prior to the user, in the other words, a graph represents the cascade of a repository's watching events. The whole Github's repositories' watching events form many graphs. In addition, the owner of the repository also has edges from the followers who starred the repository to capture the influence of open source.

Edge Weighting

Suppose the actor has less possibility to influence followers by time, to diminish the influence by time, the edges are weighted by a Fibonacci function, 1.0 / fib(interval + 2), the fib is the Fibonacci series from 0 and the unit of interval is a day. Longer the events' interval, lesser the connection is between events.

Calculate the Influence

Issue python task_cal_pagerank then python task_cal_influence. We can score the influence among users by PageRank since the cascade of watching events can be represented as a directed graph, and so forth we can get the influence of a user by combining scores which are the user got from involved graphs. To reduce noise, the score equals the unit 1 were removed before combining.

PageRank

PageRank is a link analysis algorithm and it assigns a numerical weight to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. In this survey, the elements are of the watching events and the links are of the following relationship among actors.

Normalized PageRank

Since the original PageRank is specific to a single graph, we have to find a way to combine PageRanks from multiple graphs, that is, the PageRank have to be normalized. The PageRank can be normalized by dividing the original PageRank by the least PageRank. There is a gentle introduction to the Normalized PageRank.

Classification by Language

Besides of the ranking in general, we can consider ranking by language since Github API can provide the metadata which includes the language of repository, and then we can only display the PageRank algorithm on selected repositories which are of the same language. However, the result might not very make sense due to the naive classification of languages.

Integrated Process

To gain better performance, using python task_gen_events_graphs-cal_pagerank-cal_influence.py for integrating the processes from task_gen_events_graphs to task_cal_influence.

Result Analysis

Histogram of Top 10 in General

Our goal is measuring the total influence of the star, and the maximum of a user's direct influencing stars is (starred + repos) * followers where the repos stands for the user's number of public repository, so the products can help to analyze the performance of our method. To make the histogram more readable, the products are square root of product indeed. To get this histogram, issuing python task_draw_histogram General after calculating the influence. According to the histogram, the gradient of the products is falling, the PageRank method works!

Data Insights

Evolving Graph Animation

Evolving graph animation captures the time series of watching events and their connections, we can then analyze the compactness of a repository's community by observing the forming clusters from animation. The animation was made of one frame per hour of the timeline, collapsing the gap of no event. To make the animation, issuing python task_draw_graphs {repository's full name}.

Observing the Growth of a Popular Repository

The popular repository josephmisiti/awesome-machine-learning was created at 2014-07-15T19:11:19Z, so the animation can cover its growth. The clusters in the graph might be communities, we can find that there is a main cluster in the center, growing up with the passage of time. There are some frames that most parts of the graph grew up simultaneously perhaps from spread outside the Github.

Fabrication Detection

The sebyddd/YouAreAwesome was found because of its strange presentation. It was created at 2014-08-18T18:50:57Z with an accompanying post How to get #1 trending on GitHub or ”GitHub’s security flaws”, according to the post, the stargazers were fabrications and was bursting at the time. We can observe some clues from the animation: the animation is much shorter than josephmisiti/awesome-machine-learning beacuse of the burst, and it lacks scattered clusters due to the stargazers were fabrications without natural connections.

Community Overlapping

NodeJS Community Overlaps on C++ and JavaScript

The Venn diagram of top 25 in C++ and Javascript has an order 6 intersection because many repositories for NodeJS were created by C++, one can review the developers in the intersection to realize the fact. The intersection contains mcollina, jeresig, sindresorhus, hughsk, andrew and visionmedia.

Software Prerequisites

Top 25 Influential Developers in General

  1. visionmedia
  2. sindresorhus
  3. mattt
  4. steipete
  5. daimajia
  6. andrew
  7. JakeWharton
  8. Trinea
  9. substack
  10. onevcat
  11. lexrus
  12. stormzhang
  13. turingou
  14. myell0w
  15. youxiachai
  16. addyosmani
  17. igrigorik
  18. jeresig
  19. MatthewMueller
  20. ManuelPeinado
  21. juliangruber
  22. mattn
  23. azu
  24. romaonthego
  25. xhzengAIB

Top 25 Influential Developers in Python

  1. kennethreitz
  2. mitsuhiko
  3. rochacbruno
  4. avelino
  5. jezdez
  6. lepture
  7. visionmedia
  8. pydanny
  9. saghul
  10. vinta
  11. clowwindy
  12. dahlia
  13. fengmk2
  14. tangqiaoboy
  15. jd
  16. numbbbbb
  17. osantana
  18. ionelmc
  19. jefftriplett
  20. tonyseek
  21. Zulko
  22. reduxionist
  23. turingou
  24. ellisonleao
  25. dcramer

Top 25 Influential Developers in JavaScript

  1. visionmedia
  2. sindresorhus
  3. substack
  4. turingou
  5. andrew
  6. MatthewMueller
  7. juliangruber
  8. jeresig
  9. addyosmani
  10. maxogden
  11. paulirish
  12. studiomohawk
  13. azu
  14. cheeaun
  15. feross
  16. mathiasbynens
  17. mafintosh
  18. TooTallNate
  19. yyx990803
  20. mcollina
  21. fengmk2
  22. hughsk
  23. ianstormtaylor
  24. igrigorik
  25. hakimel

Top 25 Influential Developers in Go

  1. visionmedia
  2. mattn
  3. dgryski
  4. Unknwon
  5. codegangsta
  6. rakyll
  7. daaku
  8. igrigorik
  9. bradfitz
  10. c4milo
  11. mitchellh
  12. astaxie
  13. lunny
  14. spf13
  15. mreiferson
  16. andrew
  17. philips
  18. crosbymichael
  19. fatih
  20. samuel
  21. codahale
  22. pengwynn
  23. michaelhood
  24. armon
  25. takuan-osho

Top 25 Influential Developers in Ruby

  1. ankane
  2. mattt
  3. andrew
  4. JuanitoFatas
  5. igrigorik
  6. goshakkk
  7. hsbt
  8. amatsuda
  9. chloerei
  10. josh
  11. flyerhzm
  12. huacnlee
  13. fgrehm
  14. futoase
  15. defunkt
  16. rkh
  17. parkr
  18. joker1007
  19. ryanb
  20. pengwynn
  21. mitchellh
  22. kenn
  23. r7kamura
  24. maccman
  25. takkanm

Top 25 Influential Developers in PHP

  1. GrahamCampbell
  2. sebastianbergmann
  3. fabpot
  4. Ocramius
  5. vojtech-dobes
  6. msurguy
  7. barryvdh
  8. JeffreyWay
  9. philsturgeon
  10. nikic
  11. laracasts
  12. lsmith77
  13. panique
  14. phalcon
  15. Zauberfisch
  16. taylorotwell
  17. dg
  18. igorw
  19. pminnieur
  20. harikt
  21. cfoellmann
  22. Ph3nol
  23. pippinsplugins
  24. jasonlewis
  25. Anahkiasen

Top 25 Influential Developers in Perl

  1. tokuhirom
  2. kraih
  3. miyagawa
  4. moznion
  5. DHowett
  6. visionmedia
  7. turingou
  8. agentzh
  9. lulzlabs
  10. kazeburo
  11. ingydotnet
  12. skx
  13. brendangregg
  14. gugod
  15. jberger
  16. sjackman
  17. oetiker
  18. rjbs
  19. pjf
  20. naoya
  21. goccy
  22. hirose31
  23. mattn
  24. wireghoul
  25. jonreid

Top 25 Influential Developers in CSS

  1. mdo
  2. sindresorhus
  3. addyosmani
  4. andrew
  5. zenorocha
  6. mrmrs
  7. visionmedia
  8. sahat
  9. turingou
  10. jxnblk
  11. gabrielecirulli
  12. jeresig
  13. umaar
  14. daneden
  15. csswizardry
  16. mreiferson
  17. sofish
  18. youxiachai
  19. necolas
  20. daimajia
  21. vitorbritto
  22. studiomohawk
  23. goshakkk
  24. joewalnes
  25. cheeaun

Top 25 Influential Developers in C

  1. torvalds
  2. cloudwu
  3. visionmedia
  4. mattn
  5. antirez
  6. julycoding
  7. igrigorik
  8. jwerle
  9. c9s
  10. steipete
  11. laruence
  12. andrew
  13. huangz1990
  14. phalcon
  15. clowwindy
  16. cloudhead
  17. mattt
  18. saghul
  19. r-lyeh
  20. winocm
  21. tmm1
  22. orangeduck
  23. Constellation
  24. pengwynn
  25. Trinea

Top 25 Influential Developers in C++

  1. jeresig
  2. rogerwang
  3. r-lyeh
  4. osteele
  5. zcbenz
  6. jwerle
  7. sindresorhus
  8. fabpot
  9. satoruhiga
  10. andrew
  11. ideawu
  12. BYVoid
  13. hughsk
  14. vczh
  15. hij1nx
  16. chenshuo
  17. visionmedia
  18. kylemcdonald
  19. patriciogonzalezvivo
  20. youxiachai
  21. mcollina
  22. creationix
  23. eugeneware
  24. JacksonTian
  25. indutny

Top 25 Influential Developers in Java

  1. daimajia
  2. JakeWharton
  3. Trinea
  4. stormzhang
  5. ManuelPeinado
  6. jgilfelt
  7. dodola
  8. jpardogo
  9. youxiachai
  10. kyze8439690
  11. chrisbanes
  12. mcxiaoke
  13. soarcn
  14. flavienlaurent
  15. baoyongzhang
  16. sd6352051
  17. snowdream
  18. castorflex
  19. hotchemi
  20. romannurik
  21. pedrovgs
  22. vbauer
  23. nostra13
  24. RomainPiel
  25. johnkil

Top 25 Influential Developers in C#

  1. shanselman
  2. tugberkugurlu
  3. jamesmontemagno
  4. paulcbetts
  5. prime31
  6. madskristensen
  7. robconery
  8. filipw
  9. keijiro
  10. leekelleher
  11. Haacked
  12. pierceboggan
  13. Cheesebaron
  14. migueldeicaza
  15. ayende
  16. davidfowl
  17. mythz
  18. yreynhout
  19. Rohansi
  20. Chandu
  21. adamralph
  22. neuecc
  23. punker76
  24. UnityPatterns
  25. daimajia

Top 25 Influential Developers in Objective-C

  1. steipete
  2. mattt
  3. myell0w
  4. onevcat
  5. lexrus
  6. xhzengAIB
  7. romaonthego
  8. jessesquires
  9. iiiyu
  10. krzysztofzablocki
  11. jamztang
  12. supermarin
  13. nicklockwood
  14. soffes
  15. neonichu
  16. cyndibaby905
  17. 0xced
  18. indragiek
  19. EvgenyKarkan
  20. chroman
  21. mps
  22. nst
  23. tangqiaoboy
  24. andreamazz
  25. jpsim

Top 25 Influential Developers in Swift

  1. mattt
  2. lexrus
  3. onevcat
  4. iiiyu
  5. soffes
  6. robb
  7. romaonthego
  8. krzysztofzablocki
  9. jspahrsummers
  10. indragiek
  11. AshFurrow
  12. neonichu
  13. tangqiaoboy
  14. hollance
  15. chroman
  16. qiaoxueshi
  17. myell0w
  18. rnystrom
  19. JacksonTian
  20. fastred
  21. jessesquires
  22. jakemarsh
  23. andreamazz
  24. jpsim
  25. youxiachai

Top 25 Influential Developers in Haskell

  1. sdiehl
  2. ekmett
  3. puffnfresh
  4. bos
  5. bitemyapp
  6. jfischoff
  7. cartazio
  8. cloudhead
  9. darinmorrison
  10. CodeBlock
  11. rehno-lindeque
  12. chrisdone
  13. jgm
  14. egonSchiele
  15. ocharles
  16. TimothyKlim
  17. rockymadden
  18. Gabriel439
  19. feuerbach
  20. vincenthz
  21. copumpkin
  22. adinapoli
  23. maxpow4h
  24. jonsterling
  25. Heather

Top 25 Influential Developers in Scala

  1. ryanlecompte
  2. xuwei-k
  3. jboner
  4. lihaoyi
  5. softprops
  6. paulp
  7. non
  8. ktoso
  9. krasserm
  10. puffnfresh
  11. takezoe
  12. milessabin
  13. mateiz
  14. rockymadden
  15. dlwh
  16. TimothyKlim
  17. mandubian
  18. tototoshi
  19. mrdoob
  20. ornicar
  21. hexx
  22. rxin
  23. jamieowen
  24. jsuereth
  25. pathikrit

Top 25 Influential Developers in Clojure

  1. swannodette
  2. yogthos
  3. ztellman
  4. ptaoussanis
  5. weavejester
  6. mikera
  7. michalmarczyk
  8. aphyr
  9. nathanmarz
  10. sritchie
  11. noprompt
  12. cgrand
  13. brandonbloom
  14. mkhoeini
  15. technomancy
  16. mpenet
  17. stuartsierra
  18. cemerick
  19. magnars
  20. niwibe
  21. zcaudate
  22. sgrove
  23. fogus
  24. mkremins
  25. michaelklishin

Top 25 Influential Developers in Erlang

  1. ferd
  2. 5HT
  3. DavidAlphaFox
  4. msantos
  5. benoitc
  6. jj1bdx
  7. saa
  8. eproxus
  9. jlouis
  10. RJ
  11. kocolosk
  12. nslater
  13. rvirding
  14. knutin
  15. proger
  16. mokevnin
  17. voluntas
  18. yrashk
  19. Licenser
  20. s1n4
  21. artemeff
  22. uwiger
  23. choptastic
  24. lpgauth
  25. cmeiklejohn

About

For Github Data Challenge 2014

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages