Get and analyze data on real estate sale offers in Warsaw, Poland.
Using this package you can get the latest flat sales offers from Warsaw selected according to your preferences. The data is collected from popular advertising portals (see Supported platforms) and saved for browsing or further analysis.
- olx.pl
- otodom.pl (only offers hosted on olx.pl)
.
|-- README.md
|-- __init__.py
|-- img
| |-- price_hist.png
| `-- price_median.png
|-- logging.json
|-- main
| |-- __init__.py
| |-- analysis
| | |-- __init__.py
| | `-- analyzer.py
| `-- webscraping
| |-- __init__.py
| |-- ad.py
| |-- filter.py
| |-- offer.py
| `-- scraper.py
|-- requirements.txt
|-- run.py
`-- utils
|-- __init__.py
|-- logging_config.py
`-- set_locale.py
Subpackage responsible for scraping data from the advertising portal
filter.py
- get available filters, translate filters into URL, set filters according to user definitionad.py
- collect information from websites with advertisements (price, date added etc.)offer.py
- collect information from offers (number of rooms, floor etc.)scraper.py
- create a scraper to browse the portal and find offers
Subpackage responsible for the analysis of collected flat offer data
analyzer.py
- read offer data, summarize price in total and across districts
Contains small utility functions
logging_config.py
- configure logging from json file. Use another logging configuration if you prefer.set_locale.py
- change locale within context (used for handling local time definitions)
Contains example visualizations
In run.py
file you can find an example usage of this package.
- Define filters you want to apply for search
# Define parameters for search # See available filters and values # on https://www.olx.pl/nieruchomosci/mieszkania/sprzedaz/warszawa/ selected_filters = {'Umeblowane': 'Tak', 'Liczba pokoi': ('2 pokoje', '3 pokoje'), 'Cena do': '700000', 'Dzielnica': ['Bemowo', 'Włochy', 'Wola', 'Ursynów', 'Śródmieście', 'Ochota', 'Mokotów'], 'Pow. od': '40' }
- Run scraper and export collected data to file.
# Run scraper scraper = OLXScraper(selected_filters) scraper.run() # Export data data_file = Path('.') / "data" / "scraper_data.json" scraper.export_data(data_file)
- Read collected data and run price analysis. The results are pandas DataFrames and plots.
# Read and analyze data ofan = OfferAnalyzer(data_file) price_summary = ofan.get_price_summary() price_district_summary = ofan.get_price_district_summary() ofan.show_plots()
- By default log files are stored in log/. See
run.py
- By default data files are stored in data/. See
run.py
See dependencies for a conda environment in requirements.txt
.
Polish locale has to be installed.
sudo apt-get install language-pack-pl