GitHub

inventorexploration ==== This is the code accompanying the paper: Jeff Alstott, Giorgio Triulzi, Bowen Yan, Jianxi Luo. (2017). “Inventors' Explorations Across Technology Domains.” The manuscript is in this Github repository here, on SSRN here and published at in Design Science here.

How to Reproduce the Study ==== The code base is organized as a set of IPython notebooks, which are also duplicated as simple Python .py script files. To reproduce the full study, the only thing you should need to touch directly is the notebook Manuscript_Code , which walks through all the steps of:

calculating the relatedness between technology domains from patent data, by creating randomized versions of history and comparing the empirical data to it
creating a predictive model of inventors' movements, using as predictors relatedness, popularity, and other factors.
creating figures for the manuscript, the source code for which is also contained in this repository.

The data files involved are too large to host on Github (>100MB), and so they are hosted on Zenodo here. Just download the contents to 'data/' and you should be good to go.

How to Do Your Own Analysis ==== Reproducing the full study would require significant computational resources (see below). As such, the data download also includes final versions of the data, which will allow to you to recreate just the final analyses described in the manuscript. This would also be a sensible starting place for doing your own analysis, answering new questions with the same data.

Randomization with a cluster ==== This pipeline involves creating thousands of randomized versions of the historical patent data. In order to do this, we employ a computational cluster running the PBS job scheduling system. Running this code currently assumes you have one of those. If you are lucky enough to be from the future, maybe you have a big enough machine that you can simply create and analyze thousands of randomized versions of the historical patent data using a simple for loop. We don’t yet support that.

Dependencies ==== - Python 3.x - powerlaw - seaborn - pyBiRewire - cmdstan - the standard scientific computing Python stack, which we recommend setting up by simply using the Anaconda Python distributon. Relevant packages include: - - numpy - - scipy - - matplotlib

Original Data Files ==== - citing_cited.csv - PATENT_US_CLASS_SUBCLASSES_1975_2011.csv - pid_issdate_ipc.csv - disamb_data_ipc_citations_2.csv - pnts_multiple_ipcs_76_06_valid_ipc.csv - patent_ipc_1976_2010.

Contact ==== Please contact the authors if you have questions/comments/concerns/stories: gtriulzi at mit alstott at mit

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data/Examples_and_Visualizations		data/Examples_and_Visualizations
manuscript		manuscript
src		src
.gitignore		.gitignore
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/Examples_and_Visualizations

data/Examples_and_Visualizations

manuscript

manuscript

src

src

.gitignore

.gitignore

README.rst

README.rst

Repository files navigation

About

Releases 1

Packages

Languages

jeffalstott/inventorexploration

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages