Skip to content

william-tai/Practical-Machine-Learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Practical-Machine-Learning

This book is best for professional data scientists or wanting-to-be data scientists who are looking at learning the fundamentals of Machine Learning Techniques and the most efficient ways of applying and implementing these machine learning techniques on large datasets using the most relevant machine learning frameworks and tools on or off Hadoop platform, given the problem definition, the hands-on way. The readers are expected to have basic programming skills in java and knowledge of any scripting languages will be a bonus.

This book focuses on exploring all the Machine Learning techniques and some specific behavioral differences or implementation intricacies with the parallel or distributed processing approach. Additionally, for each technique along with a deep dive on internals of each algorithm, example implementations using top and evolving machine learning frameworks and tools like R, SPSS, Apache Mahout, Python, Julia and Spark is explained. This book helps readers master Machine Learning techniques and gain ability to identify and apply appropriate techniques in the given problem context. In the context of large datasets, multi-core cluster based learning, distributed learning, parallel computation tools and libraries and more. The readers will be exposed to a list of machine learning frameworks and for each of the frameworks detailed implementation aspects like function libraries, syntax, installation or set-up and integration with Hadoop (wherever applicable) will be covered.

Until recent past, the machine learning community has assumed sequential algorithms on data that fits in memory. This assumption is no longer realistic for many recent scenarios and has brought in some interesting perspectives to Advanced Machine Learning. Despite this growing interest, there haven’t been many publications on how these solutions integrate with our data management systems. The success of data-driven solutions for complex problems with the dropping infrastructure or storage costs has brought focus on large scale machine learning. Below is a list of topics that will be covered in this book:

  1. Learn and master platforms, algorithms, and applications for machine learning techniques classified under supervised, unsupervised, semi-supervised, reinforcement and deep learning.
  2. Analyze and prepare large data sets and design your own machine learning system
  3. Take a deep dive into each of the machine learning algorithm and learn how to implement in more than one ways (Explore alternative implementation platforms and learn how to rationalize which one to choose), given the problem context.
  4. For each of the identified platforms, learn how to set-up environment, load large scale data and explore the syntax and understand the implementation nuances.
  5. How does Machine Learning link with Hadoop? Understand Hadoop as a platform for distributed and parallel processing paradigm.
  6. For each of the Machine Learning Technique, take a deep dive into the internals of the concept and implement using one or more of the identified tools or libraries that includes Mahout, R, Python, SPSS and Spark. For each of the libraries or framework: a. Learn to set-up the environment b. Develop machine learning programs for real world examples, c. Deploy and execute these programs on large data sets in Hadoop (wherever applicable) to identify precise patterns and predict the outcomes.

This book covers all important machine learning techniques that include:

  1. Chapter 5: Decision Tree based learning methods - Decision trees using C4.5, C5.0 and Random Forests
  2. Chapter 6: Association rule based learning methods - Apriori and FP-growth
  3. Chapter 7: Instance based learning methods - K-Nearest Neighbors
  4. Chapter 7: Kernel based learning methods - Supprt Vector Machines
  5. Chapter 8: Clustering based learning methods - K means clustering
  6. Chapter 9: Bayesian learning methods - Naive Bayes
  7. Chapter 10: Regression learning methods - Linear and Logistic regression
  8. Chapter 11: Deep learning methods
  9. Chapter 12: Reinforcement learning methods - Q-learning
  10. Chapter 13: Ensemble methods - Bosstong (Ada, Gradient), Random forests

For each of the learning methods the implementation source code is provided in the following programing languauges

  1. Apache Mahout
  2. R
  3. Spark - MLib
  4. Python (sckit-learn)
  5. Julia (Java & Scala based)

The project structure is maintained per programming language wise, further by chapter and then specific algorithm.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 27.6%
  • Julia 20.0%
  • Python 19.3%
  • Scala 13.4%
  • C++ 9.3%
  • R 7.0%
  • Other 3.4%