Practical-Machine-Learning

This book is best for professional data scientists or wanting-to-be data scientists who are looking at learning the fundamentals of Machine Learning Techniques and the most efficient ways of applying and implementing these machine learning techniques on large datasets using the most relevant machine learning frameworks and tools on or off Hadoop platform, given the problem definition, the hands-on way. The readers are expected to have basic programming skills in java and knowledge of any scripting languages will be a bonus.

This book focuses on exploring all the Machine Learning techniques and some specific behavioral differences or implementation intricacies with the parallel or distributed processing approach. Additionally, for each technique along with a deep dive on internals of each algorithm, example implementations using top and evolving machine learning frameworks and tools like R, SPSS, Apache Mahout, Python, Julia and Spark is explained. This book helps readers master Machine Learning techniques and gain ability to identify and apply appropriate techniques in the given problem context. In the context of large datasets, multi-core cluster based learning, distributed learning, parallel computation tools and libraries and more. The readers will be exposed to a list of machine learning frameworks and for each of the frameworks detailed implementation aspects like function libraries, syntax, installation or set-up and integration with Hadoop (wherever applicable) will be covered.

Until recent past, the machine learning community has assumed sequential algorithms on data that fits in memory. This assumption is no longer realistic for many recent scenarios and has brought in some interesting perspectives to Advanced Machine Learning. Despite this growing interest, there haven’t been many publications on how these solutions integrate with our data management systems. The success of data-driven solutions for complex problems with the dropping infrastructure or storage costs has brought focus on large scale machine learning. Below is a list of topics that will be covered in this book:

Learn and master platforms, algorithms, and applications for machine learning techniques classified under supervised, unsupervised, semi-supervised, reinforcement and deep learning.
Analyze and prepare large data sets and design your own machine learning system
Take a deep dive into each of the machine learning algorithm and learn how to implement in more than one ways (Explore alternative implementation platforms and learn how to rationalize which one to choose), given the problem context.
For each of the identified platforms, learn how to set-up environment, load large scale data and explore the syntax and understand the implementation nuances.
How does Machine Learning link with Hadoop? Understand Hadoop as a platform for distributed and parallel processing paradigm.
For each of the Machine Learning Technique, take a deep dive into the internals of the concept and implement using one or more of the identified tools or libraries that includes Mahout, R, Python, SPSS and Spark. For each of the libraries or framework: a. Learn to set-up the environment b. Develop machine learning programs for real world examples, c. Deploy and execute these programs on large data sets in Hadoop (wherever applicable) to identify precise patterns and predict the outcomes.

This book covers all important machine learning techniques that include:

Chapter 5: Decision Tree based learning methods - Decision trees using C4.5, C5.0 and Random Forests
Chapter 6: Association rule based learning methods - Apriori and FP-growth
Chapter 7: Instance based learning methods - K-Nearest Neighbors
Chapter 7: Kernel based learning methods - Supprt Vector Machines
Chapter 8: Clustering based learning methods - K means clustering
Chapter 9: Bayesian learning methods - Naive Bayes
Chapter 10: Regression learning methods - Linear and Logistic regression
Chapter 11: Deep learning methods
Chapter 12: Reinforcement learning methods - Q-learning
Chapter 13: Ensemble methods - Bosstong (Ada, Gradient), Random forests

For each of the learning methods the implementation source code is provided in the following programing languauges

Apache Mahout
R
Spark - MLib
Python (sckit-learn)
Julia (Java & Scala based)

The project structure is maintained per programming language wise, further by chapter and then specific algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
julia		julia
mahout		mahout
python-sckit-learn		python-sckit-learn
r		r
spark		spark
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

julia

julia

mahout

mahout

python-sckit-learn

python-sckit-learn

r

r

spark

spark

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Practical-Machine-Learning

About

Releases

Packages

Languages

william-tai/Practical-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Practical-Machine-Learning

About

Resources

Stars

Watchers

Forks

Languages