Skip to content

Wrappers of Jerome Friedman's coordinate-descent Fortran implementation of lasso/elastic net regression from the R "glmnet" package.

License

Notifications You must be signed in to change notification settings

ceholden/glmnet-python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

glmnet-python

Python wrapper to the Fortran implementation of GLMNET by Friedman et al.

This is a fork of glmnet-python by David Warde-Farley who is the original author of the majority of the wrapper code, especially all of the difficult parts! This fork has restructured the code to follow the scikit-learn model estimation API while providing some additional capabilities, including cross-validation of lambda values.

Comparison with other forks

As of December 18th, 2015, there is another fork of glmnet-python by Github user "shuras" that I would recommend using in favor of this fork. This other fork has a wider range of capabilities and is more actively under development. A non-exhaustive list of capabilities offered by the "shuras" et al. fork include:

  • Plot and diagnostic utilities
  • Logistic regression
  • Handling of sparse data
  • Multi-response elastic nets
  • Tests and more examples
  • Planned support of Cox and Poisson models

Basically, steer clear of this fork unless you want a fast replacement for running GLMNET within a scikit-learn API framework.

Requirements

  • numpy>=1.3
  • scikit-learn>=0.14.0

Building

In order to get double precision working without modifying Friedman's code, some compiler trickery is required. The wrappers have been written such that everything returned is expected to be a real*8 i.e. a double-precision floating point number, and unfortunately the code is written in a way Fortran is often written with simply real specified, letting the compiler decide on the appropriate width. f2py assumes real are always 4 byte/ single precision, hence the manual change in the wrappers to real*8, but that change requires the actual Fortran code to be compiled with 8-byte reals, otherwise bad things will happen (the stack will be blown, program will hang or segfault, etc.).

AFAIK, this package requires gfortran to build. g77 will not work as it does not support -fdefault-real-8.

The way to get this to build properly is:

python setup.py config_fc --fcompiler=gnu95 \
    --f77flags='-fdefault-real-8' \
    --f90flags='-fdefault-real-8' build

The --fcompiler=gnu95 business may be omitted if gfortran is the only Fortran compiler you have installed, but the compiler flags are essential.

License

Friedman's code in glmnet.f is released under the GPLv2, necessitating that any code that uses it (including my wrapper, and anyone using my wrapper) be released under the GPLv2 as well. See LICENSE for details.

That said, to the extent that they are useful in the absence of the GPL Fortran code (i.e. not very), my portions may be used under the 3-clause BSD license.

Thanks

  • Thanks to David Warde-Farley for his original work on the Python wrapper that contributed 99% of the effort I've based my additions on.

From David Warde-Farley:

  • To Jerome Friedman for the fantastically fast and efficient Fortran code.
  • To Pearu Peterson for writing f2py and answering my dumb questions.
  • To Dag Sverre Seljebotn for his help with f2py wrangling.
  • To Kevin Jacobs for showing me his wrapper which helped me side-step some problems with the auto-generated .pyf.

References

About

Wrappers of Jerome Friedman's coordinate-descent Fortran implementation of lasso/elastic net regression from the R "glmnet" package.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Fortran 88.1%
  • Python 11.9%