RECOMMENDER SYSTEM FOR LUCID.BLOG

INTRODUCTION

This is the Task submission by Team C++ of the HNG Internship 6.0. We were assigned to build a Recommender System for lucid.blog. The recommender suggests who to follow and what articles to read for users based on data gathered from lucid database.

Getting Started

The following steps were taken to achieve the objectives:

Prerequisites

We imported the necessary libraries:

import pandas as pd
import mysql.connector
from sqlalchemy import create_engine

Loaded the dataset

mydb = mysql.connector.connect(host="remotemysql.com",
                              user="8SawWhnha4",
                              passwd="zFvOBIqbIz",
                              database="8SawWhnha4")

engine = create_engine('mysql+mysqlconnector://8SawWhnha4:zFvOBIqbIz@remotemysql.com/8SawWhnha4')

Fetched the tables in the dataset

dbcursor = mydb.cursor()
dbcursor.execute('show tables')
for table in dbcursor:
    print(table)

Next we did some Exploratory Data Analysis

EXPLORATORY DATA ANALYSIS

We checked out the comments table,contact_settings table,ext_feed_banks table,ext_rsses table,following table, etc. for relevant keys in user and post data frame and procceded to check their shapes.

users.keys()
posts.keys()
users.shape,posts.shape

Checked users with similar short_bio and those who posted similar title we applied this syntax

users['short_bio'].value_counts()
posts['title'].value_counts()

DATA WRANGLING

We proceeded with Data Wrangling by applying the below syntax to get rid of html tags,white lines,square brackets and image files so we can have clean data for our Model

posts['content'] = posts['content'].str.replace(r'<[^>]*>', '')
posts['content'] = posts['content'].str.replace(r'\s', ' ')
posts['content'] = posts['content'].str.replace(r'\[.*?\]', '')
posts['content'] = posts['content'].str.replace(r'\(.*?\)', '')

MODEL FOR WHO TO FOLLOW ON LUCID.BLOG

We required a few steps to make our model perfect:-

Imported relevant modules

import pandas as pd
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer

Filled missing values with empty string and computing TF-IDF matrix required for calculating cosine similarity

users['short_bio'] = users['short_bio'].fillna('')
users_short_bio_matrix = lucid_tfidf.fit_transform(users['short_bio'])

Got the shape of the Dataframe

users_short_bio_matrix.shape

Calculated the cosine similarity for our users_short_bio_matrix and got our user indices
Created a function to recommend user to follow based on similarities in their short_bio

def recommend_to_follow(index, cosine_sim=cosine_similarity):
    if index<users_short_bio_matrix.shape[0]:
        id = user_indices[index]
         similarity_scores = list(enumerate(cosine_sim[id]))
        similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
        similarity_scores = similarity_scores[1:6]
         lucid_index = [i[0] for i in similarity_scores]
     Return the top 5 most similar names
        return users['name'].iloc[lucid_index]
    else: return "No recommedations for this user"

MODEL FOR WHAT ARTICLE TO READ ON LUCID.BLOG

The following steps were repeated:-

Filled missing values with empty string
Computed TF-IDF matrix required for calculating cosine similarity
Checked shape of our posts matrix
Calculated cosine similarity for our post_matrix
Get our post indices
Created a function to recommend articles to read,getting pairwise similarity score sorting them and getting top 5 in respect to them

def recommend_article_to_read(index, cosine_sim=cosines_similarity):
    if index<posts_matrix.shape[0]:
        id = posts_indices[index]
        # Get the pairwsie similarity scores of all names
        # sorting them and getting top 5
        similarity_scores = list(enumerate(cosine_sim[id]))
        similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
        similarity_scores = similarity_scores[1:6]
        # Get the names index
        lucid_index = [i[0] for i in similarity_scores]
        # Return the top 5 most similar names
        return posts['title'].iloc[lucid_index]
    else: return "No recommedations for this user"

HOW THE RECOMMENDER SYSTEM WORKS

TEST FOR WHO TO FOLLOW RECOMMENDER SYSTEM FOR LUCID.BLOG

recommend_to_follow(50)
#This displays something like:
105     Damilare Olabimtan
279        UDENKWOR NKECHI
438         Angela Egerega
604    chukwuemeka anyanwu
641          Deborah Ajayi

TEST FOR WHAT ARTICLE TO READ RECOMMENDER SYSTEM FOR LUCID.BLOG

recommend_article_to_read(24)
#This displays something like:
0    I learnt how to use the table tag as i have us...
1     I am on this journey with start.ng, and here ...
2    I have not been attending classes on the HNG c...
3    My journey on **StartNG** pre-internship progr...
4     A Summary on The “idongesit.html” CV, Its Str...

Built with Anaconda's Jupyter notebook by TEAM C++

Link to Lucid post https://lucid.blog/grace.eye73/post/team-c-recommender-system-2d8

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Blog		Blog
Other-Models		Other-Models
Python-Src		Python-Src
lucid.csv		lucid.csv
Article_Recommender_System.ipynb		Article_Recommender_System.ipynb
Article_Recommender_System.py		Article_Recommender_System.py
People_To_Follow_Recommender_System.ipynb		People_To_Follow_Recommender_System.ipynb
People_To_Follow_Recommender_System.py		People_To_Follow_Recommender_System.py
Popular_User_Recommender.ipynb		Popular_User_Recommender.ipynb
README.md		README.md
Testing-Recommender-Systems.ipynb		Testing-Recommender-Systems.ipynb
Testing_Recommender_Systems.ipynb		Testing_Recommender_Systems.ipynb
Url-For-Testing		Url-For-Testing
users_df.csv		users_df.csv
word2vec_who_to_follow_model.pkl		word2vec_who_to_follow_model.pkl

akmhel/HNG-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

RECOMMENDER SYSTEM FOR LUCID.BLOG

INTRODUCTION

Getting Started

Prerequisites

EXPLORATORY DATA ANALYSIS

DATA WRANGLING

MODEL FOR WHO TO FOLLOW ON LUCID.BLOG

MODEL FOR WHAT ARTICLE TO READ ON LUCID.BLOG

HOW THE RECOMMENDER SYSTEM WORKS

TEST FOR WHO TO FOLLOW RECOMMENDER SYSTEM FOR LUCID.BLOG

TEST FOR WHAT ARTICLE TO READ RECOMMENDER SYSTEM FOR LUCID.BLOG

Built with Anaconda's Jupyter notebook by TEAM C++

About

Resources

Stars

Watchers

Forks

Languages