The Avalanche Canada data analysis toolkit is a collection of Python files for retrieving, processing, and visualizing historically forecasted Avalanche Canada danger ratings for select forecast regions and dates. Specifically, this toolkit provides a method to scrape danger rating data and examines danger rating statistics and anamolies between current day and 1- and 2-day out forecasted dagner ratings.
For help or questions/comments contact hurleyldave@gmail.com or message me on Medium
- Current Conditions - the present days danger ratings as forecasted by Avalanche Canada
- Current Plus1 Conditions - tomorrows danger ratings (i.e. one day ahead of current conditions) as forecasted by Avalanche Canada
- Current Plus2 Conditions - day after tomorrows danger ratings (i.e. two days ahead of current conditions) as forecasted by Avalanche Canada
- Forecast Anamoly - percentage of time that a 1- or 2-day out forecasted danger rating differs or agrees with current conditions. This provides insight into the forecast confidence and conservativeness of the forecast (i.e. do they forecast a higher danger rating than what is presented the day of)
- Danger Ratings - avalanche hazard on a scale from 1 (lowest) to 5 (highest)
- Forecast Region - geographic area that Avalanche Canada forecast applies
- Alpine, Treeline, Belowtree - elevation the forecast applies (i.e. alpine is above treeline, treeline is sparse trees, belowtree is forested)
- Data scraping (i.e. retrieval) code lives in
scripts
- Test tools to confirm data scraping PATHS are still relevant live in
tests
- Raw and cleaned data for select forecast regions and dates lives in
data
- Jupyter Notebook to analyze and visualize danger rating data lives in
notebooks
- Result figures are in save to
figures
- Clone this repo, instructions found HERE
- Open a command prompt and navigate to the newly cloned repo
- Create a virtual environment by executing
python -m venv YOUR-VENV-NAME
in a command prompt and replaceYOUR-VENV-NAME
with whatever you like. Instructions found HERE - Activate the virtual environment, in Linux this is
source YOUR-VENV-NAME/bin/activate
- Install dependencies by executing
pip install -r requirements.txt
in a command prompt
Code to scrape and save historical Avalanche Canada danger ratings for day of and 1- and 2-day out conditions for any forecast region and date range. Raw data is saved to data/raw
. Scraping is performed with Python and Selenium.
Scraping involves launching a web browser, in this case Firefox, and extracting information from a desired page path. Sometimes the page path can change or break so it's a good idea to test the scraping code prior to use.
Perform the following to test the scraping code:
- Open a command prompt and navigate to the root directory of the cloned repo (likely
avalanche-canada-data-analysis
) - Execute
python -m unittest discover tests
- If the code passes the tests an
OK
will be displayed (note, this may take 10-15 seconds)
Perform the following to scrape new data:
- Open
scrape_inputs.json
and set the desired forecast region and date range to scrape. Also, determine if a web browser should be displayed while scraping (suggest yes as it provides feedback). Note, the forecast region must match EXACTLY with the Avalanche Canada regions. To check a forecast region name go HERE and select the forecast region then confirm the forecast region name displayed in the URL address bar (i.e. this might besea-to-sky
orsouth-coast-inland
). - Open a command prompt or IDE and navigate to
scripts
- Execute
python scrape_export_data.py
, this may take some time. The results will be save todata/raw
Code to clean missing data, remove gaps in the record, and save cleaned data to data/cleaned
Perform the following to clean raw data:
- Open
clean_inputs.json
and point to the desired files to edit and set the forecast region. The filenames must match files indata/raw
- Open a command prompt or IDE and navigate to
scripts
- Execute
python clean_scraped_data.py
. Cleaned data sets are save todata/cleaned
Jupyter Notebook to perform data analysis and data visualization on cleaned dataset.
Perform the following:
- Open a command prompt in the root directory
- Execute
ipython kernel install --user --name=YOUR-VENV-NAME
, replaceYOUR-VENV-NAME
with the name of your virtual environment - Launch Jupyter Notebook (
jupyter notebook
in the command prompt) and navigate tonotebooks
and open file2020_10_10_dh_clean_explore_data.ipynb
- In the toolbar select
kernel
and choosevenv
- Follow instruction in notebook to point to desired cleaned files and run code
Alternativley, this can be run using 2020_10_10_dh_clean_explore_data.py