Enter the starting point and destination of a trip in the future, and get an estimate for how long and how much.
The program constantly requests real-time estimates for rides generated by a stochastic process, and stores the estimates. The estimate for a given trip is computed by searching historical data for similar trips.
For instance, you want to know how long it will take and how much money it will cost to get from The Marriot to the airport two months from now, when you are flying home from vacation. You enter the trip and the time into a web/mobile interface, and get back a duration in minutes and a price in dollars (or local currency).
This project is developed with:
Once those tools are installed on your system, get the code and set up the virtual environment like so:
$ git clone git@github.com:mroll/rate-predictr.git
$ cd rate-predictr
$ mkvirtualenv rate-predictr
$ pip install -r requirements.txt
Commands currently supported by the cli:
$ ./main.py add_location boston_common --lat=42.354706 --lng=-71.066450
This will add a row to the location
database table. The row will have the name
boston_common
and the lat and longtide values seen above. This is a
convencience method for adding locations to be used by the get_costs
command.
$ ./main.py get_costs --center=boston_common --radius=R N
This will query the Lyft API for cost estimates for N
trips randomly selected
in an R
mile radius around center. The trips and the returned estimate will be
recorded in the database.
See util.py:random_point_in_circle
for how the coordinates of the start and
end location of the trip are generated.
The program needs to generate start and end locations to send to Lyft for cost estimates. The start and end locations are parsed by the lyft api as (lat,lng) pairs, so our algorithm should generate (lat,lng) pairs.
To generate a batch of coordinate pairs, we can draw from a set of points in some bounded area, using a probability distribution. Right now the code uses a circular area around a given center, and a uniform distribution. In the future, more complicated polygons and distributions could be used, to reflect actual Lyft usage.
Given a start location, and end location, and a time, how do you find the most likely cost and duration of a ride?
Since the (lat,lng) pairs are real numbers that go out to several decimal places, we cannot rely on having exact matches in the database for a given trip. An approximation must be good enough.
One idea is to look for trips with a start location within some distance X of the given start location, and an end location within that same distance X of the given end location. Then take the average cost of those trips. Maybe there are other parameters to use for filtering, like time of the year, weather, time of day, etc., that can be used to make the approximation more accurate.