SEA-final-project

Building up movie seach engine plus customized recommendation system

#Constants files: google drive

Working Procedure

1. Split data into many partitions

#Note, the num of partitions should corresping to the num of backend works
#Default: (NumSuperFront, NumMaster, NumMovie, NumReview, NumIdx, NumDoc)= (1, 3, 3, 3, 3, 3)
python -m src.reformatter <# of partitions for review> <# of partitions for movie>

2. call mapreduce workers

python -m mapreduce.workers

3. call classification workers

python -m classification.workers

4.prepare pickle files for all servers

python -m Prepare

call mapreduce workers

python -m mapreduce.workers

call classification workers

python -m classification.workers

##prepare pickle files for all servers

python -m Prepare

##Start All the works

Goal: 1. find ports, 2. indexing, 3. fire up all servers

<<<<<<< HEAD

=======
## 5. Start All servers
Goal: 1. find ports, 2. fire up all servers

798c3e4056310a7ef46a703de9ae47eb7f00bfb6 python ./StartAll.py


## 6. Fire up frontend (google app engine)
https://cloud.google.com/sdk/#Quick_Start

dev_appserver.py --host=localhost --port=8080 frontend


#Structure:
The structure of fired-uped HTTP servers are:

                    --> classifier_front(?)   --> ?

User --> SuperFront --> searchEng_front --> searchEng_worker (inclusing IndexServer3, and DocumentServer3) --> recom_front --> recom_worker (inclusing MovieServer3, and ReviewServer3)

#Recommendation System:
###Goal: getting the user ID --> check user log to get review history --> check MovieServer to get similar critics --> check ReviewServer to get movies sorted by weighted rating
###Stucture and Usage:

recom_front --> MovieServer3 --> ReviewServer3

#recom_front api: #http://linserv2.cims.nyu.edu:46829/recom?user=UserID (e.g. http://linserv2.cims.nyu.edu:46829/recom?user=d0aa6e9b-676b-428f-9758-65e7c09b38a4)

#MovieServer api:

http://linserv2.cims.nyu.edu:46831/movie?movieID=MovieIDs (e.g. http://linserv2.cims.nyu.edu:46831/movie?movieID=770802394+770882996+12900+13217+11705+770876740+770710325+771362322+533693794+348462568)

#ReviewServer api: #http://linserv2.cims.nyu.edu:46834/review?critics=CRITICS (e.g. http://linserv2.cims.nyu.edu:46834/review?critics=Emanuel_Levy+Roger_Ebert)


Current UserLog is created by:

python ./src/createFakeUserLog.py

#So it will create 20 reviews per user with random scoring on random movie. Total for 50 users with unique ID created.
#saved at ../userLog/myUserBook



#TomatoCrawler
##Goal: to fetch rotten tomato website and save the info properly
Now we have:
- 250 movie to search
- 1718 movieIDs returned
```python
#If you like tomatoCrawler to save Movie_fs, Review_fs, and IDs_fs to file system
from src import tomatoCrawler
tomatoCrawler.main2FS()

#Or! just ask tomatoCrawler to save Movie_dict, Review_fs, and IDs_fs to ./constants as pickle files
tomatoCrawler.main2NormalDict()

#File System module Usage ##Distributed dictionary object

from fs import DisTable

#Creating an object
a = DisTable()
# or
b = DisTable({ 1: 'a', 2: 'b', 3: 'c'})

#Set a key-value pair
a[1] = 'a'
a[2] = 'b'
#Get a value with key
a[1]
#returns 'a'

#Pop operation
a.pop(1)
#returns 'a' and remove (1, 'a') from dictionary

#hasKey operation
a.hasKey(2)
#returns True
a.hasKey(1)
#returns False

#Length property
a.length
#returns 1

#Pretty print of dictionary
print a
#1
#   a
'''
key1
  value1
  value2
  ...
key2
  value1
  value2
  ...
'''

##Distributed List

from fs import DisList

#Creating an object
a = DisList()
# or
b = DisList([1, 2, 3, 4])

#Append/Extend a value into list
a.append(1)
a.append(2)
a.extend(3)
a.extend(4)

#Get a value given position
a[0]
#returns 1
a[1]
#returns 2

#Update value to given position
a[1] = 3
print a
#[ 1 3 3 4 ]

#Remove value from list
a.remove(1)
print a
#[ 3 3 4 ]
a.remove(3, globl=True)
print a
#[ 4 ]

#Pop operation
a.pop(1)
#returns 'a' and remove (1, 'a') from dictionary

#Length property
a.length
#returns 1

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
classification		classification
constants		constants
frontend		frontend
fs		fs
mapreduce		mapreduce
recommendation		recommendation
searchEngine		searchEngine
src		src
userLog		userLog
.gitignore		.gitignore
Prepare.py		Prepare.py
README.md		README.md
StartAll.py		StartAll.py
classifier_worker_address.json		classifier_worker_address.json
fsStart.py		fsStart.py
main.py		main.py

cs1384/SEA-final-project

Folders and files

Latest commit

History

Repository files navigation

SEA-final-project

Working Procedure

1. Split data into many partitions

2. call mapreduce workers

3. call classification workers

4.prepare pickle files for all servers

call mapreduce workers

call classification workers

http://linserv2.cims.nyu.edu:46831/movie?movieID=MovieIDs (e.g. http://linserv2.cims.nyu.edu:46831/movie?movieID=770802394+770882996+12900+13217+11705+770876740+770710325+771362322+533693794+348462568)

About

Resources

Stars

Watchers

Forks

Languages