Data Science Portfolio by Dimitrios Effrosynidis

This portfolio is a compilation of notebooks which I created for Data Science related tasks like Tutorials, Exploratory Data Analysis, and Machine Learning. More notebooks will be added as I learn things and devote time to write about them.

Visit my website or my Medium profile, where I include everything listed here and much more.

Below it is a summary of them.

🔎 Outlier Detection — Theory, Visualizations, and Code

The article is available on Towards Data Science and the code is located here.

What is Outlier Detection?
Causes
Applications
Approaches
Taxonomy
Algorithms - Isolation Forest, Extended Isolation Forest, Local Outlier Factor, DBSCAN, One Class SVM, Ensemble

🔥 Exploratory Data Analysis for the popular Battle Royale game PUBG

This is a very popular kaggle kernel with more than 800 upvotes and 30.000 views, with which I won the 1st prize for the best kernel in that Kaggle competition.

🕤 Time Series Analysis with Theory, Plots, and Code

Two articles on Towards Data Science (Part 1, Part 2). Code is available here.

What is a Time Series?
The Basic Steps in a Forecasting Task
Time Series Graphics (Time Plot, Seasonal Plot, Seasoonal Subseries Plot, Lag Scatter Plot)
Time Series Components
Stationarity
Autocorrelation
Moving Average, Double and Triple Exponential Smoothing

💥 Forecasting Wars: Classical Forecasting Methods vs Machine Learning

The task is to forecast, as precisely as possible, the unit sales (demand) of various products sold in the USA by Walmart. Competitors: Simple Exponential Smoothing, Double Exponential Smoothing, Triple Exponential Smoothing, ARIMA, SARIMA, SARIMAX, Light Gradient Boosting, Random Forest, Linear Regression.

The article is available on Towards Data Science and the code is located here.

🏡 Clustering Neighborhoods

This is a project that aims to help practicing some technologies and Data Science.

Let's suppose that you live in Toronto, Canada (you can do this for every city that has enough data) and you found a better job. This job is located in the other side of the city and you decide that you need to re-locate closer. You really like your neighborhood though, and you want to find a similar one.

This code uses the venues of each neighborhood as features in a clustering algorithm (k-means) and finds similar neighborhoods.

Things that were used

Beautiful Soup - Package that lets us extract the content of a web page into simple text
Json - Handle json files and transform them into a pandas dataframe
Geocode - Package that converts an address to its coordinates
Scikit Learn - Machine learning package in order to use clustering
Folium - Package to create spatial maps. NOTE: Maps that are created from folium are not displayed in jupyter notebook. I provide links to them as static images.

📙 Pandas Tutorial

Are you starting with Data Science? Pandas is perhaps the first best thing you will need. And it's really easy!

After reading (and practising) this tutorial you will learn how to:

Create, add, remove and rename columns
Read, select and filter data
Retrieve statistics for data
Sort and group data
Manipulate data

📏 Normalization and Standardization

Normalization/standardization are designed to achieve a similar goal, which is to create features that have similar ranges to each other and are widely used in data analysis to help the programmer to get some clue out of the raw data.

This notebook includes:

Normalization
Why normalize?
Standardization
Why standardization?
Differences?
When to use and when not
Python code for Simple Feature Scaling, Min-Max, Z-score, log1p transformation

🔧 Encoding Categorical Features

Python code on how to transform nominal and ordinal variables to integers.

This Notebook includes:

Ordinal Encoding with LabelEncoder, Panda's Factorize, and Panda's Map
Nominal Encoding with One-Hot Encoding and Binary Encoding

📊 Visualizations with Seaborn

Every plot that seaborn provides is here with examples in a real dataset.

This notebook includes:

Theory on Skewness and Kurtosis
Univariate plots. [Histogram, KDE, Box plot, Count plot, Pie chart]
Bivariate plots. [Scatter plot, Join plot, Reg plot, KDE plot, Hex plot, Line plot, Bar plot, Violin plot, Boxen plot, Strip plot]
Multivariate plots. [Correlation Heatmap, Pair plot, Scatter plot, Line plot, Bar plot]

🕥 Feature Engineering with Dates

In this tutorial I present the datetime format that Pandas provides to handle datetime features. In the end I create a function that generates 23 features from a single one.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
Notebooks		Notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebooks

Notebooks

README.md

README.md

Repository files navigation

Data Science Portfolio by Dimitrios Effrosynidis

🔎 Outlier Detection — Theory, Visualizations, and Code

🔥 Exploratory Data Analysis for the popular Battle Royale game PUBG

🕤 Time Series Analysis with Theory, Plots, and Code

💥 Forecasting Wars: Classical Forecasting Methods vs Machine Learning

🏡 Clustering Neighborhoods

📙 Pandas Tutorial

📏 Normalization and Standardization

🔧 Encoding Categorical Features

📊 Visualizations with Seaborn

🕥 Feature Engineering with Dates

About

Releases

Packages

Languages

som-pat/Data-Science-Portfolio

Folders and files

Latest commit

History

Notebooks

Notebooks

README.md

README.md

Repository files navigation

Data Science Portfolio by Dimitrios Effrosynidis

🔎 Outlier Detection — Theory, Visualizations, and Code

🔥 Exploratory Data Analysis for the popular Battle Royale game PUBG

🕤 Time Series Analysis with Theory, Plots, and Code

💥 Forecasting Wars: Classical Forecasting Methods vs Machine Learning

🏡 Clustering Neighborhoods

📙 Pandas Tutorial

📏 Normalization and Standardization

🔧 Encoding Categorical Features

📊 Visualizations with Seaborn

🕥 Feature Engineering with Dates

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages