- Python 80.4%
- HTML 14.2%
- CSS 5%
- Dockerfile 0.4%
|
All checks were successful
CI / lint-and-test (push) Successful in 47s
Pure reformatting from a newer ruff release that tightened a few line-collapse rules; tests untouched. Restores `ruff format --check` to green so the CI workflow at .forgejo/workflows/ci.yml passes again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .forgejo/workflows | ||
| data | ||
| src/lidl_monitor | ||
| tests | ||
| .env.example | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| CLAUDE.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
lidl-monitor
Self-hosted scraper for the Lidl Slovenia weekly catalog. Pulls the full SI assortment twice a week, keeps a forever archive in SQLite, exposes a small FastAPI UI + RSS feed, and pushes Gotify notifications when new products match your keywords.
FastAPI + SQLite + APScheduler, packaged as a single Docker container.
Quick deploy
The expected deployment is a Linux host (any architecture Docker supports) exposed to you over Tailscale, typically with Tailscale Funnel if you want the RSS feed and links reachable from the public internet (e.g. so your phone's RSS reader can hit it when off-tailnet).
1. Clone and configure
git clone <your-fork-url> lidl-monitor
cd lidl-monitor
cp .env.example .env
Edit .env:
| Key | What it does |
|---|---|
BASE_URL |
External URL the app is reachable at. Used for RSS <link> tags and Gotify click-through. On a Tailscale Funnel setup this is your https://<host>.<tailnet>.ts.net name. |
GOTIFY_URL, GOTIFY_TOKEN |
Your Gotify server + app token. Leave both empty to disable push entirely. |
GOTIFY_PRIORITY |
1–10; 5 is a normal notification. |
RUN_DAYS, RUN_HOUR, RUN_MINUTE, TIMEZONE |
Scrape schedule. Defaults (mon,thu 06:00 Europe/Ljubljana) match Lidl SI's rotation. |
LIDL_LOCALE, LIDL_ASSORTMENT |
Leave as sl_SI / SI unless scraping another Lidl region. |
Don't commit .env — .gitignore already excludes it. The SQLite
database (data/lidl.db) is tracked, so the catalog history travels
with the repo.
2. Start the container
docker compose up --build -d
docker compose logs -f # tail
The app listens on http://<host>:8000. data/ is bind-mounted into
the container at /data, so the database survives rebuilds.
Sanity check:
curl http://localhost:8000/healthz # {"ok": true}
curl -X POST http://localhost:8000/admin/refresh # triggers a scrape
3. Expose with Tailscale Funnel (optional)
If the box is already in your tailnet:
# Make the service reachable to everyone on your tailnet.
sudo tailscale serve --bg --https=443 http://localhost:8000
# Or expose it to the public internet (requires Funnel enabled in the
# admin console for this node).
sudo tailscale funnel --bg --https=443 http://localhost:8000
Then set BASE_URL=https://<host>.<tailnet>.ts.net in .env and
restart:
docker compose up -d
RSS subscribers (e.g. in an RSS reader on your phone) can now hit
https://<host>.<tailnet>.ts.net/feed.xml from anywhere.
4. Add keywords
Open /keywords in a browser, add substring or regex patterns. The
/keywords page shows a live preview of what each keyword would match
in the current catalog. The home page's "Latest by keyword" section
shows active keyword hits in the current catalog, re-evaluated on every
load — so new keywords work immediately without waiting for the next
scrape.
New keyword hits produce a Gotify push on the next scrape run. Pushes
are idempotent: restart-safe via the match.notified_at column.
Day-to-day
docker compose logs -f # follow logs
docker compose restart # after changing .env only
docker compose up --build -d # after changing code
curl -X POST http://localhost:8000/admin/refresh # scrape on demand
A typical scrape logs 5 paged HTTP requests + fetched N products. If
you see a 401 after page 1, the Lidl WAF (Myra Cloud) rejected the
request — check the Accept header in fetcher.py. See CLAUDE.md for
the gory details.
Endpoints
GET /— home: week list, search box, latest hits per keyword.GET /search?q=…— substring search across the full archive.GET /w/{iso_year}/{iso_week}— one week's products + flyer PDF/thumbnail.GET /p/{product_id}— product detail + sighting history.GET /feed.xml— RSS of new products. Optional?q=filter.GET /keywords— keyword CRUD + live match preview.POST /admin/refresh— trigger a scrape now.GET /healthz
Data
Everything lives in data/lidl.db (SQLite, WAL-mode). Schema:
run— one row per scrape cycle.product— one row perproductIdever seen. Never deleted.sighting— many-to-many of runs × products, withis_newflag and per-run price.keyword,match— user keywords and the products they've matched (withnotified_attracking Gotify delivery).flyer— one row per ISO week with the decorative leaflet PDF + thumbnail.
Because the DB is committed, git log data/lidl.db gives you a
timeline of scrape snapshots.