Skip to content

zezutom/haversine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

I have recently enrolled to Introduction to Data Science. One of the very first assignments was Twitter sentinent analysis performed in Python. Leaving a whole lot aside, what captured my attention was a requirement to resolve tweets' geocoded locations WITHOUT relying on 3rd party services.

The assignment paper suggested to use a Python Dictionary of State Abbreviations. That proved helpful indeed. I have decided to combine this resource with Average Latitude and Longitude for US States and ended up with a single dictionary containing all essential information, i.e. state codes, names and coordinates:

{
  'AK': {'name':'Alaska','coords':[61.3850,-152.2683]},
  'AL': {'name':'Alabama','coords':[32.7990,-86.8073]},
  'AR': {'name':'Arkansas','coords':[34.9513,-92.3809]},
  'AS': {'name':'American Samoa','coords':[14.2417,-170.7197]},
  'AZ': {'name':'Arizona','coords':[33.7712,-111.3877]},
  'CA': {'name':'California','coords':[36.1700,-119.7462]},
  'CO': {'name':'Colorado','coords':[39.0646,-105.3272]},
  'CT': {'name':'Connecticut','coords':[41.5834,-72.7622]},
  'DC': {'name':'District of Columbia','coords':[38.8964,-77.0262]},
  'DE': {'name':'Delaware','coords':[39.3498,-75.5148]},
  'FL': {'name':'Florida','coords':[27.8333,-81.7170]},
  'GA': {'name':'Georgia','coords':[32.9866,-83.6487]},
  'HI': {'name':'Hawaii','coords':[21.1098,-157.5311]},
  'IA': {'name':'Iowa','coords':[42.0046,-93.2140]},
  ..

A complete dictionary is to be found in us_states.py.

Having all the relevant information in place, I was looking for a feasible way of associating the tweets with the list of US states. Turns out that Haversine formula is one of the most popular methods for calculating distance between two pairs of coordinates.

My implementation of the Haversine formula merely mirrors a Python example at platoscave.net, here is the result (see us_states.py for full details):

def haversine(self, origin, destination):
  # two pairs of latitude and longitude, i.e. origin vs destination
  lat1, lon1 = origin
  lat2, lon2 = destination

  # deltas between origin and destination coordinates
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)

  # a central angle between the two points
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
      * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)

  # the determinative angle of the triangle on the surface of the sphere (Earth) 
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

  # a spherical distance between the two points, i.e. hills etc are not considered 
  return self.R * c 

The algorithm above is the core of my custom search method, which simply picks up the state which closely matches the provided coordinates (a minimum distance). To eliminate non-US countries, I have set a hard limit of 500 km as a maximum distance between the provided coordinates and the average coordinates of any of the states. This leaves me with a nice and handy feature:

def main():
  us_states = USStates()
  
  # Sacramento, California - prints CA
  print us_states.by_coords(38.3454, -121.2935)
  
  # Austin, Texas - prints TX
  print us_states.by_coords(30.25, -97.75)
  
  # New Delhi, India - yields no results 
  # as the minimum calculated distance is well over 13.000 km
  print us_states.by_coords(28.6139, 77.2089)

One last note, the coordinates comprise latitude and longitude using the convention of a signed decimal degrees without compass direction. Negative numbers represent south or west, examples:

#   latitudes:
#   30° 45´ 50´´N -> 30.4550
#   28° 61´ 39´´S -> -28.6139
#
#   longitudes:
#   77° 20´ 89´´E -> 77.2089
#   30° 45´ 50´´W -> -30.4550

us_states.py contains the full implementation, whereas us_states_test.py are unit tests covering the main scenarios as well as some edge cases.

About

An application of the Haversine formula for calculation of spherical distance between two locations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages