def prep_data(db: PostgreSQL): """ 1) Import necessary shapefiles to PostGIS 2) Extract nodes from street segments within study area 3) Assign closest street node ID to all SEPTA bus stops near study area """ # 1) Import necessary shapefiles to PostGIS # ----------------------------------------- all_tables = db.all_tables_as_list() for sql_tablename, shp_path_suffix in [ ("philly_streets", "philly complete streets/philly_complete_streets.shp"), ("septa_bus_stops_fall_2019", "Fall_2019_Stops_By_Route/Fall_2019_Stops_By_Route.shp"), ("study_bounds", "Draft_Study_Area_Extent/U_CIty_Study_Area_Dissolve_2.shp"), ]: if sql_tablename not in all_tables: full_path = GIS_FOLDER / shp_path_suffix db.import_geodata(sql_tablename, full_path) # 2) Extract nodes from street segments within study area # ------------------------------------------------------- point_query = """ with raw as ( select st_startpoint(geom) as startpoint, st_endpoint(geom) as endpoint from philly_streets where st_intersects( geom, (select st_collect(geom) from study_bounds) ) ), merged_data as ( select startpoint as geom from raw union select endpoint as geom from raw ) select row_number() over() as streetnodeid, geom from merged_data group by geom """ db.make_geotable_from_query(point_query, "street_nodes", "POINT", 26918)
def db_setup(database: str, folder: str): """Roll a starter database from DVRPC's production DB""" folder = Path(folder) db = PostgreSQL(database, verbosity="minimal", **CREDENTIALS["localhost"]) create_project_database(db, folder)
def add_segmentation_to_sidewalks(db: PostgreSQL): # Split the sidewalks wherever they intersect a trail # --------------------------------------------------- sidewalk_split = """ select globalid, (st_dump( st_split( s.geom, (select st_collect(geom) from ped_trails t where st_intersects(s.geom, t.geom) ) ) )).geom from pedestriannetwork_lines s """ db.make_geotable_from_query(sidewalk_split, "sidewalk_splits", geom_type="LINESTRING", epsg=26918) # Merge the split sidewalks with any sidewalks that didn't get split # ------------------------------------------------------------------ sidewalk_merge = """ select 'sidewalk - raw' as src, geom from pedestriannetwork_lines where not st_within(geom, (select st_buffer(st_collect(geom), 0.5) from sidewalk_splits) ) union select 'sidewalk - split' as src, geom from sidewalk_splits """ db.make_geotable_from_query(sidewalk_merge, "sidewalk_merged", geom_type="LINESTRING", epsg=26918)
def generate_islands(db: PostgreSQL, schema: str): """ Use the sidewalk layer to merge intersecting geometries. The output is a layer with one feature per 'island' """ query = f""" SELECT ST_COLLECTIONEXTRACT( UNNEST(ST_CLUSTERINTERSECTING(geom)), 2 ) AS geom FROM {schema}.sidewalks """ db.make_geotable_from_query(query, "islands", "MULTILINESTRING", 26918, schema=schema)
def _test_make_geotable_from_query(db: PostgreSQL, shp: DataForTest): new_geotable = "test_make_geotable_multilinestring" query = f""" SELECT ST_UNION(geom) AS geom FROM {shp.NAME} """ # Make a new geotable db.make_geotable_from_query(query, new_geotable, geom_type="MULTILINESTRING", epsg=shp.EPSG) # Confirm that the new table's EPSG matches the expected value epsg = db.all_spatial_tables_as_dict()[new_geotable] assert epsg == shp.EPSG
def import_production_sql_data(remote_db: PostgreSQL, local_db: PostgreSQL): """ Copy data from DVRPC's production SQL database to a SQL database on localhost. Note - this connection will only work within the firewall. """ data_to_download = [ ("transportation", ["pedestriannetwork_lines", "pedestriannetwork_points", "pa_centerline", "CircuitTrails", ]), ("structure", ["points_of_interest"]), ("boundaries", ["municipalboundaries"]), ] for schema, table_list in data_to_download: for table_name in table_list: # Extract the data from the remote database and rename the geom column query = f"SELECT * FROM {schema}.{table_name};" gdf = remote_db.query_as_geo_df(query, geom_col="shape") gdf = gdf.rename(columns={"shape": "geometry"}).set_geometry("geometry") # Check if we have multipart geometries. # If so, explode them (but keep the index) multipart = False for geom_type in gdf.geom_type.unique(): if "Multi" in geom_type: multipart = True if multipart: print(f"EXPLODING {table_name}") gdf = gdf.explode() gdf['explode'] = gdf.index gdf = gdf.reset_index() # Save to the local database local_db.import_geodataframe(gdf, table_name.lower())
def analyze_network(schema: str, database: str, speed: str): """Run the sidewalk network analysis with Pandana""" try: speed = float(speed) except ValueError: speed = None db = PostgreSQL(database, verbosity="minimal", **CREDENTIALS["localhost"]) network = SidewalkNetwork(db, schema, walking_mph=speed)
def create_new_geodata(db: PostgreSQL): """ 1) Merge DVRPC municipalities into counties 2) Filter POIs to those within DVRPC counties """ pa_counties = "('Bucks', 'Chester', 'Delaware', 'Montgomery', 'Philadelphia')" nj_counties = "('Burlington', 'Camden', 'Gloucester', 'Mercer')" # Add regional county data regional_counties = f""" select co_name, state_name, (st_dump(st_union(geom))).geom from public.municipalboundaries m where (co_name in {pa_counties} and state_name ='Pennsylvania') or (co_name in {nj_counties} and state_name = 'New Jersey') group by co_name, state_name """ db.make_geotable_from_query( regional_counties, "regional_counties", "Polygon", 26918, schema="public" ) # Clip POIs to those inside DVRPC's region regional_pois = """ select * from public.points_of_interest where st_intersects(geom, (select st_collect(geom) from public.regional_counties)) """ db.make_geotable_from_query( regional_pois, "regional_pois", "Point", 26918, schema="public" )
def _test_import_csv_matches(db: PostgreSQL, csv: DataForTest): # Import a CSV file df = pd.read_csv(csv.PATH_URL) # Get the number of rows in the raw dataframe csv_row_count, _ = df.shape # Get the number of rows in the new SQL table query = f""" SELECT COUNT(*) FROM {csv.NAME} """ db_table_row_count = db.query_as_single_item(query) # Confirm that the dataframe & SQL table have matching rowcounts assert csv_row_count == db_table_row_count
def clip_data(state: str, municipality: str, buffer: str, database): """Clip source data down to a single municipality""" if municipality == "": municipality = None try: buffer = float(buffer) except ValueError: buffer = None db = PostgreSQL(database, verbosity="minimal", **CREDENTIALS["localhost"]) clip_inputs(db, state, municipality=municipality, buffer_meters=buffer)
def export_data(db: PostgreSQL): """ Using the prepared data, extract a point shapefile to disk. Geometry is the intersection nodes that are near bus stops. Attributes include the total number of weekday boardings, alightings, and a list of the bus routes and stop IDs that contributed to the sums. """ # Aggregate boardings/alightings by street node id and join to geometry join_query = """ with ridership_data as ( select nearest_node, sum(weekday_bo) as boardings, sum(weekday_le) as alightings, array_agg(route) as routes, array_agg(gid) as stopgid from septa_bus_stops_fall_2019 where nearest_node is not null group by nearest_node ) select row_number() over() as uid, d.*, n.geom from ridership_data d left join street_nodes n on n.streetnodeid = d.nearest_node """ gdf = db.query_as_geo_df(join_query) # The SQL array_agg() gets translated into a list by geopandas # and shapefiles don't support this data type # so instead, convert list to comma-delimited-list first for col in ["routes", "stopgid"]: gdf[col] = gdf[col].apply(lambda x: ",".join(map(str, x))) # Write to disk gdf.to_file(GIS_FOLDER / "bus_stops_with_ridership_agg.shp")
def write_query_to_geojson(filename: str, query: str, db: PostgreSQL): """ Write SQL query out to geojson file on disk. """ # Put into Google Drive, if configured if FOLDER_DATA_PRODUCTS: output_filepath = FOLDER_DATA_PRODUCTS / f"{filename}.geojson" # Otherwise just drop it into the active directory else: output_filepath = f"{filename}.geojson" # Extract geodataframe from SQL df = db.query_as_geo_df(query) # Save to file df.to_file(output_filepath, driver="GeoJSON")
def _test_transfer_data_spatial(db1: PostgreSQL, db2: PostgreSQL, shp: DataForTest): table_name = shp.NAME # Make sure that this table does not exist in the 2nd database if table_name in db2.all_tables_as_list(): db2.table_delete(table_name) # Transfer to the 2nd database db1.transfer_data_to_another_db(table_name, db2) # Confirm there is now a table in the DB assert table_name in db2.all_spatial_tables_as_dict()
def create_project_database(local_db: PostgreSQL, shp_folder: Path): """ Batch execute the whole process: 1) copy SQL data 2) import shapefiles 3) load a median() function 4) make some helper GIS data 5) import transit data from OpenData portals 6) save pg_dump of database """ if platform.system() in ["Linux", "Windows"] \ and "dvrpc.org" in socket.getfqdn(): dvrpc_credentials = pGIS.configurations()["dvrpc_gis"] remote_db = PostgreSQL("gis", **dvrpc_credentials) import_production_sql_data(remote_db, local_db) import_shapefiles(shp_folder, local_db) load_helper_functions(local_db) create_new_geodata(local_db) import_transit_data(local_db) else: print("\n-> !!!Initial DB setup can only be executed from a DVRPC workstation!!!")
def assign_stops_to_street_intersections(db: PostgreSQL): # 3) Assign closest street node ID to all SEPTA bus stops near study area # ----------------------------------------------------------------------- df = db.query_as_df( """ select gid from septa_bus_stops_fall_2019 where st_dwithin( st_transform(geom, 26918), (select st_collect(geom) from study_bounds sb), 30 ) """ ) db.table_add_or_nullify_column("septa_bus_stops_fall_2019", "nearest_node", "int") for _, row in tqdm(df.iterrows(), total=df.shape[0]): gid = row[0] query = f""" select n.streetnodeid from septa_bus_stops_fall_2019 s, street_nodes n where s.gid = {gid} order by st_distance(st_transform(s.geom, 26918), n.geom) asc limit 1 """ node_id = db.query_as_single_item(query) update = f""" update septa_bus_stops_fall_2019 set nearest_node = {node_id} where gid = {gid} """ db.execute(update)
def database_1(): # Set up the database db = PostgreSQL("test_from_ward", verbosity="minimal", **configurations()["localhost"]) # Import CSV and shapefile data sources db.import_csv(test_csv_data.NAME, test_csv_data.PATH_URL, if_exists="replace") db.import_geodata(test_shp_data.NAME, test_shp_data.PATH_URL, if_exists="replace") # Yield to the test yield db # Don't tear down this database!!! # This is done later as part of test_final_cleanup.py # Delete temp shapefiles test_shp_data.flush_local_data() test_csv_data.flush_local_data()
def _test_db_delete(db: PostgreSQL): db.db_delete() assert not db.exists()
def _test_import_csv(db: PostgreSQL, csv: DataForTest): # Confirm the CSV is now a table in the DB assert csv.NAME in db.all_tables_as_list()
def generate_nodes(database: str, tablename: str): """ Generate topologically-sound nodes for the sidewalk lines """ db = PostgreSQL(database, verbosity="minimal", **CREDENTIALS["localhost"]) generate_sidewalk_nodes(db, tablename)
def prepare_trail_data(db: PostgreSQL): # Filter down to only the existing trails # --------------------------------------- trail_query = " SELECT * FROM circuittrails WHERE circuit = 'Existing' " db.make_geotable_from_query( trail_query, "existing_trails", geom_type="LINESTRING", epsg=26918 ) # Figure out if each segment should be included # --------------------------------------------- db.table_add_or_nullify_column("existing_trails", "sw_coverage", "FLOAT") uid_list = db.query_as_list("SELECT uid FROM existing_trails") # Template to get the % covered by sidewalk features query_template = """ select sum( st_length( st_intersection(geom, (select st_buffer(geom, 10) from existing_trails where uid = UID) ) ) ) / (select st_length(geom) from existing_trails where uid = UID) from pedestriannetwork_lines where st_dwithin( st_startpoint(geom), (select geom from existing_trails where uid = UID), 10 ) or st_dwithin( st_endpoint(geom), (select geom from existing_trails where uid = UID), 10 ) """ for uid in tqdm(uid_list, total=len(uid_list)): uid = uid[0] query = query_template.replace("UID", str(uid)) result = db.query_as_single_item(query) if not result: result = 0 update_query = f""" UPDATE existing_trails SET sw_coverage = {result} WHERE uid = {uid}; """ db.execute(update_query)
poi_node_pairs = poi_node_pairs.set_geometry("flow") poi_node_pairs['geom'] = poi_node_pairs["flow"].apply( lambda x: WKTElement(x.wkt, srid=self.epsg)) for col in ["flow", "geom_from", "geom_to"]: poi_node_pairs.drop(col, inplace=True, axis=1) engine = create_engine(self.db.uri()) poi_node_pairs.to_sql( f"poi_{self.themes[theme]}_qa", engine, schema=self.schema, if_exists="replace", dtype={'geom': Geometry("LineString", srid=self.epsg)}) engine.dispose() self.poi_gdf = poi_gdf if __name__ == "__main__": from sidewalk_gaps import CREDENTIALS schema = "camden" db = PostgreSQL("sidewalk_gaps", verbosity="minimal", **CREDENTIALS["localhost"]) network = SidewalkNetwork(db, schema)
def summarize_into_hexagons(database: str): """ Classify centerlines w/ length of parallel sidewalks """ db = PostgreSQL(database, verbosity="minimal", **CREDENTIALS["localhost"]) hexagon_summary(db)
def _test_pgsql2shp(db: PostgreSQL, shp: DataForTest): # Export a spatial table to shapefile output_shp = db.pgsql2shp(shp.NAME, shp.EXPORT_FOLDER) assert output_shp.exists()
def hexagon_summary(db: PostgreSQL): db.make_hexagon_overlay("hexagons", "regional_counties", 26918, 3) for colname in [ "islands", "poi_min", "poi_median", "poi_max", "cl_len", "sw_len" ]: db.table_add_or_nullify_column("hexagons", colname, "FLOAT") for state, schema in [("New Jersey", "nj"), ("Pennsylvania", "pa")]: print(f"Processing {state}") hex_query = f""" SELECT * FROM hexagons WHERE st_intersects( st_centroid(geom), (select st_collect(geom) from regional_counties where state_name = '{state}' ) ) """ db.make_geotable_from_query(hex_query, "hexagon_summary", "POLYGON", 26918, schema=schema) uid_query = f""" SELECT uid FROM {schema}.hexagon_summary """ uid_list = db.query_as_list(uid_query) for uid in tqdm(uid_list, total=len(uid_list)): uid = uid[0] geom_subquery = f"select geom from {schema}.hexagon_summary where uid = {uid}" # Get the number of islands # ------------------------- island_update = f""" update {schema}.hexagon_summary h set islands = ( select count(island_geom) from ( SELECT ST_COLLECTIONEXTRACT( UNNEST(ST_CLUSTERINTERSECTING(geom)), 2 ) AS geom FROM {schema}.sidewalks sw where st_within(sw.geom, h.geom) ) as island_geom ) where h.uid = {uid} """ db.execute(island_update) # Get the min and max distance to nearest school # ---------------------------------------------- q_network = f""" select min(n_1_school), median(n_1_school), max(n_1_school) from {schema}.access_results where n_1_school < 180 and st_intersects( geom, ({geom_subquery}) ) """ poi_result = db.query_as_list(q_network) poi_min, poi_med, poi_max = poi_result[0] # # Replace "None" values with a dummy number if str(poi_min) == "None": poi_min = "NULL" if str(poi_med) == "None": poi_med = "NULL" if str(poi_max) == "None": poi_max = "NULL" # Get the centerline length # ------------------------- cl_query = f""" select sum(st_length(st_intersection( geom, ({geom_subquery}) ))) as cl_len from {schema}.centerlines where st_intersects(geom, ({geom_subquery})) """ cl_results = db.query_as_list(cl_query) cl_len = cl_results[0][0] if str(cl_len) == "None": cl_len = 0 # Get the sidewalk length # ------------------------- sw_query = f""" select sum(st_length(st_intersection( geom, ({geom_subquery}) ))) as sw_len from {schema}.sidewalks where st_intersects(geom, ({geom_subquery})) """ sw_results = db.query_as_list(sw_query) sw_len = sw_results[0][0] if str(sw_len) == "None": sw_len = 0 # Update the table with the results # --------------------------------- update_query = f""" UPDATE {schema}.hexagon_summary SET poi_min = {poi_min}, poi_median = {poi_med}, poi_max = {poi_max}, cl_len = {cl_len}, sw_len = {sw_len} WHERE uid = {uid} """ db.execute(update_query) # Combine state-specific hexagons into one final summary layer # ------------------------------------------------------------ query = """ SELECT * FROM nj.hexagon_summary UNION SELECT * FROM pa.hexagon_summary """ db.make_geotable_from_query(query, "hexagon_summary", "POLYGON", 26918)
def classify_centerlines( db: PostgreSQL, schema: str, tbl: str, new_col: str = "sidewalk" ): # Get a list of all centerlines we want to iterate over. oid_query = f""" SELECT objectid FROM {schema}.{tbl} """ # But first... check if the new_col exists # If so, iterate over null features # Otherwise, make the column and operate on the entire dataset column_already_existed = new_col in db.table_columns_as_list(tbl, schema=schema) if column_already_existed: print("Picking up where last left off...") oid_query += f""" WHERE {new_col} IS NULL """ else: print("Analyzing for the first time...") db.table_add_or_nullify_column(tbl, new_col, "FLOAT", schema=schema) # Hit the database oid_list = db.query_as_list(oid_query) # pop the results out of tuples into a simple list oid_list = [x[0] for x in oid_list] query_template = f""" SELECT SUM( ST_LENGTH( ST_INTERSECTION(sw.geom, (SELECT ST_BUFFER(c.geom,25))) ) ) FROM {schema}.sidewalks sw, {schema}.centerlines c where c.objectid = OID_PLACEHOLDER AND ST_INTERSECTS(sw.geom, (SELECT ST_BUFFER(c.geom,25))) AND sw.line_type = 1 AND ( ST_LENGTH( ST_INTERSECTION(sw.geom, (SELECT ST_BUFFER(c.geom,25))) ) > 25 OR ST_LENGTH(sw.geom) <= 25 ) """ for oid in tqdm(oid_list, total=len(oid_list)): oid_query = query_template.replace("OID_PLACEHOLDER", str(oid)) sidwalk_length_in_meters = db.query_as_single_item(oid_query) if not sidwalk_length_in_meters: sidwalk_length_in_meters = 0 update_query = f""" UPDATE {schema}.{tbl} c SET {new_col} = {sidwalk_length_in_meters} WHERE objectid = {oid} """ db.execute(update_query)
db.make_geotable_from_query(query, "sidewalks_and_trails", geom_type="LINESTRING", epsg=26918) def cleanup_temp_tables(db: PostgreSQL): """ Delete the intermediate tables to keep the DB clean """ for tbl in [ "trail_splits", "trail_merged", "sidewalk_splits", "sidewalk_merged" ]: db.table_delete(tbl) def merge_topologies(db: PostgreSQL): add_segmentation_to_trails(db) add_segmentation_to_sidewalks(db) merge_sidewalks_and_trails(db) cleanup_temp_tables(db) if __name__ == "__main__": from sidewalk_gaps import CREDENTIALS, PROJECT_DB_NAME db = PostgreSQL(PROJECT_DB_NAME, verbosity="minimal", **CREDENTIALS["localhost"]) merge_topologies(db)
def db_connection(db_name: str = DB_NAME) -> PostgreSQL: return PostgreSQL(db_name, verbosity="minimal")
def _test_shp2pgsql(db: PostgreSQL, shp: DataForTest): # Import the shapefile _ = db.shp2pgsql(shp.NAME, shp.IMPORT_FILEPATH) assert shp.NAME in db.all_spatial_tables_as_dict()
def db_connection(db_name: str = DB_NAME) -> PostgreSQL: return PostgreSQL(db_name, verbosity="minimal", **configurations()["localhost"])
def add_segmentation_to_trails(db: PostgreSQL): """ Split the trail layer wherever it intersects a sidewalk """ # Make a filtered version of the trail data that pedestrians can use # ------------------------------------------------------------------ trail_query = """ SELECT * FROM circuittrails WHERE circuit = 'Existing' AND (facility NOT LIKE '%%Bicycle%%' OR facility IS NULL); """ db.make_geotable_from_query(trail_query, "ped_trails", geom_type="LINESTRING", epsg=26918) # Split the trails wherever they intersect a sidwalk # -------------------------------------------------- trail_split = """ select globalid, (st_dump( st_split( t1.geom, (select st_collect(geom) from pedestriannetwork_lines p1 where st_intersects(t1.geom, p1.geom) ) ) )).geom from ped_trails t1 """ db.make_geotable_from_query(trail_split, "trail_splits", geom_type="LINESTRING", epsg=26918) # Merge the split trails with any trails that didn't get split # ------------------------------------------------------------ trail_merge = """ select 'trail - raw' as src, geom from ped_trails where not st_within(geom, (select st_buffer(st_collect(geom), 1.5) from trail_splits ts2) ) union select 'trail - split' as src, geom from trail_splits """ db.make_geotable_from_query(trail_merge, "trail_merged", geom_type="LINESTRING", epsg=26918)