- Python 86.8%
- HTML 8.1%
- CSS 5.1%
|
All checks were successful
CI / lint-and-test (push) Successful in 1m19s
Mirrors the avtonet-monitor / lidl-monitor pattern: docker runner with python:3.12-slim container, actions/checkout@v4, pip cache, ruff + pytest in one job. Drops the native + git clone shell workflow and the 3-way format/lint/test split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .forgejo/workflows | ||
| nepremicnine_monitor | ||
| tests | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| CLAUDE.md | ||
| config.toml | ||
| etn_monitor.py | ||
| pyproject.toml | ||
| README.md | ||
| seed.sql | ||
nepremicnine-monitor
A terminal-based real estate monitor for nepremicnine.net, Slovenia's largest property listing portal. Tracks listings across configurable searches, enriches them with public data, and estimates fair prices.
Features
TUI (Terminal User Interface)
Interactive Textual-based interface with three tabs:
- Listings — sortable table of all tracked listings showing price, size, year, distance to Ljubljana, estimated fair price, GURS valuation, description tags, and more
- Summary — aggregated view grouped by region, municipality, and property type with averages and counts (deduplicates cross-listed properties)
- Log — live scraping progress
Key bindings:
| Key | Action |
|---|---|
f |
Open filter dialog (year, price, size, distance, categories) |
r |
Trigger immediate scrape of all monitors |
1/2/3 |
Switch tabs |
Enter |
Open listing in browser |
q |
Quit |
Scraping
- Uses
curl_cffiwith Chrome impersonation to bypass Cloudflare - Paginated search result scraping with duplicate page detection
- Detail page scraping for full listing data (location, description, year adapted)
- Configurable crawl delay and request timeout
- Automatic periodic re-scraping at a configurable interval
Data Enrichment
GURS Cadastral Data
Queries the GURS (Geodetska uprava RS) cadastral API to match listings with official building records:
- Matches by settlement name + building area (m2) within a configurable tolerance
- Retrieves official building valuation (
posplosena vrednost) for price comparison - Handles Ljubljana district normalization (e.g. "LJ-Vic" to "Vic")
- Falls back to sub-area extraction from listing titles when settlements exceed the API's 200-result limit
- Match statuses:
matched,multiple_matches,no_match,too_many_results,no_settlement
Geocoding & Distance
- Geocodes municipalities via Nominatim (OpenStreetMap)
- Computes driving distance to Ljubljana via OSRM, with Haversine fallback
- Results cached in SQLite to avoid repeated API calls
Description Tagging
Regex-based extraction of structured tags from Slovenian listing descriptions:
- Permits:
use_permit,no_use_permit,building_permit,no_building_permit - Legal:
illegal_build,under_construction - Condition:
needs_renovation,renovated - Financial:
investment,rental_tenants - Amenities:
garage,parking - Energy:
energy_class_A1throughenergy_class_G
Negative signals override positive ones (e.g. "nima uporabnega dovoljenja" suppresses use_permit).
Price Estimation
OLS (Ordinary Least Squares) regression trained per property type in log-price space:
- Houses: features are building size, land size, effective age, driving distance
- Apartments: features are apartment size, effective age, driving distance, floor number
- Outlier filtering (2.5 sigma on price/m2) before training
- Uses
year_adaptedoveryear_builtfor effective age when available - Minimum 8 listings per group to train; no external dependencies (pure Python solver)
Installation
Requires Python 3.11+.
pip install -e .
For development (tests, linting):
pip install -e ".[dev]"
Configuration
Create a config.toml file:
[[monitors]]
name = "Apartments Slovenia"
url = "https://www.nepremicnine.net/oglasi-prodaja/slovenija/stanovanje/?s=16"
max_pages = 25
[[monitors]]
name = "Houses Slovenia"
url = "https://www.nepremicnine.net/oglasi-prodaja/slovenija/hise/?s=16"
max_pages = 25
[scraper]
crawl_delay_seconds = 5
request_timeout_seconds = 30
[database]
path = "nepremicnine.db"
[monitor]
interval_minutes = 60
Each [[monitors]] entry defines a search URL to track. The url should be a nepremicnine.net search results page; max_pages controls how deep pagination goes.
Usage
TUI mode (default)
nepremicnine-monitor config.toml
CLI commands
# Backfill listings from a specific search URL
nepremicnine-monitor config.toml backfill-search "https://www.nepremicnine.net/oglasi-prodaja/..."
# Geocode all municipalities and compute driving distances
nepremicnine-monitor config.toml backfill-distances
# Enrich listings with GURS cadastral data (optional tolerance in m2)
nepremicnine-monitor config.toml enrich-gurs 2.0
# Extract description tags for all untagged listings
nepremicnine-monitor config.toml tag-descriptions
Project Structure
nepremicnine_monitor/
__main__.py CLI entry point, subcommand dispatch
scraper.py Async HTTP client with Cloudflare bypass
parser.py HTML parsing for search results and detail pages
db.py SQLite database layer (thread-safe, WAL mode)
tui.py Textual TUI application
geocoder.py Nominatim + OSRM geocoding and distance
gurs_enricher.py GURS cadastral API integration
description_tagger.py Regex-based tag extraction from Slovenian text
pricing.py OLS fair price estimator
tests/
test_db.py Database CRUD and query tests
test_parser.py HTML parsing tests
test_scraper.py Pagination and dedup tests
test_description_tagger.py Tag extraction tests
test_pricing.py OLS estimator tests
Development
# Run tests
pytest tests/ -v
# Format
black nepremicnine_monitor/ tests/
isort nepremicnine_monitor/ tests/
# Lint
pylint nepremicnine_monitor/
Pre-commit hooks are configured for black, isort, and pylint.
License
Private project.