Skip to content


Repository files navigation

Predicting solid state qubit candidates DOI

This is the main repository behind predicting solid state qubit candidates for quantum technology, where the first release was the main work behind the master thesis in this repository.

Run project / Reproduction

The application of this project is centered around an exploratory analysis using Jupyter notebooks. It is not neccessary to run anything to see result, only consult the notebooks either here on Github or Jupyers` nbviewer project. However, if you intend running the notebooks, read the next section 'Development'.


Jupyter notebooks

  1. Clone the project.

  2. Make a virtual environment:


     python -m venv .venv


     python3 -m venv .venv
     source .venv/bin/activate
  3. Run the following script to install all packages defined in setup:

     python3 -m pip install -e .
  4. Add your API-keys from Materials Project and Citrination to your environmental variables (e.g. use an '.env'-file). Then run the following script to open with jupyter notebook:

     jupyter notebook
  5. Run all notebooks chronologically.

Instead of running the notebooks to generate data, we've also made the development of tools and code available with make.


The following command will extract MP data based on 0.1eV and ICSD-entry, and start the featurization process based on the This is the only way to run the featurizer at this stage.

    make features

The following command is an easier method to apply for all data in this project, thus an easier method to run 01-generateDataset-notebook.ipynb.

    make data

Is this repo up to date?

New data is added for Materials Project randomly and will make a new featurization process needed for every update. This is currenly a long and tedious process (for implemented). Therefore, data featurized for this repo only include December 2020 version of data from MP.

Project Organization

├── Makefile           <- Makefile with commands like `make data` or `make train`
├──          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── raw            <- The original, immutable data dump.
│   └── <approaches>       <- The final, canonical data sets for modeling.
├── docs               <- A default Sphinx project; see for details
├── models             <- Trained and serialized models, model predictions, or model summaries
│   └── <approaches>    <- Similarly for the other approaches
│       ├── summary
│       └── trained-models
├── notebooks                                <- Jupyter notebooks.
│   ├── 01-generateDataset-notebook.ipynb    <- Generate data notebooks.
│   ├── 02-buildFeatures-notebook.ipynb      <- Construct features.
│   ├── 03-preprocessing-notebook.ipynb      <- Clean and preprocess features.
│   ├── method-01-Ferrenti-approach                    
│   │   ├── 04-dataMining-notebook.ipynb                 <- Datamining approach 1.
│   │   └── PCA-NUMBER-<insert pca number>
│   │         ├── 05-supervisedLearning-notebook.ipynb   <- Machine learning and predictions
│   │         └── 06-postAnalysis-notebook.ipynb         <- Analyse the predictions
│   ├── method-02-Extended-Ferrenti-approach          
│   │   ├── 04-dataMining-notebook.ipynb                 <- Datamining approach 2.
│   │   └── PCA-NUMBER-<insert pca number>
│   │         ├── 05-supervisedLearning-notebook.ipynb   <- Machine learning and predictions
│   │         └── 06-postAnalysis-notebook.ipynb         <- Analyse the predictions
│   └── method-03-Empirical-approach                  
│       ├── 04-dataMining-notebook.ipynb                 <- Datamining approach 3.
│       └── PCA-NUMBER-<insert pca number>
│             ├── 05-supervisedLearning-notebook.ipynb   <- Machine learning and predictions
│             └── 06-postAnalysis-notebook.ipynb         <- Analyse the predictions
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│   └──       
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
├── requirements.txt   <- The requirements file for reproducing the analysis environment
├──           <- makes project pip installable (pip install -e .) so src can be imported
└── src                <- Source code for use in this project.
    ├──    <- Makes src a Python module
    ├── data           <- Scripts to download or generate data
    │   ├──
    │   ├──
    │   ├──
    │   ├──
    │   ├──
    │   ├──
    │   ├──
    │   ├──
    │   └──
    ├── features       <- Scripts to turn raw data into features for modeling
    │   └──
    │   └──
    │   └──
    ├── models         <- Scripts to train models and then use trained models to make
    │   │                 predictions
    │   ├──
    │   └──
    └── visualization  <- Scripts to create exploratory and results oriented visualizations

Project based on the cookiecutter data science project template. #cookiecutterdatascience