Self-hosted Lidl SI weekly catalog scraper with RSS feed + Gotify push notifications

Python 80.4%
HTML 14.2%
CSS 5%
Dockerfile 0.4%

Find a file

maks 4f66ceb27b All checks were successful CI / lint-and-test (push) Successful in 47s Details Apply ruff 0.15 format pass Pure reformatting from a newer ruff release that tightened a few line-collapse rules; tests untouched. Restores `ruff format --check` to green so the CI workflow at .forgejo/workflows/ci.yml passes again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-16 19:36:58 +02:00
.forgejo/workflows	Add test suite, pre-commit config, and Forgejo CI	2026-04-20 12:27:19 +02:00
data	Add deployment README, monitor page, and run-log schedule admin	2026-04-20 12:13:29 +02:00
src/lidl_monitor	Apply ruff 0.15 format pass	2026-05-16 19:36:58 +02:00
tests	Apply ruff 0.15 format pass	2026-05-16 19:36:58 +02:00
.env.example	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00
.gitignore	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00
.pre-commit-config.yaml	Add test suite, pre-commit config, and Forgejo CI	2026-04-20 12:27:19 +02:00
CLAUDE.md	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00
docker-compose.yml	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00
Dockerfile	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00
pyproject.toml	Add test suite, pre-commit config, and Forgejo CI	2026-04-20 12:27:19 +02:00
README.md	Add deployment README, monitor page, and run-log schedule admin	2026-04-20 12:13:29 +02:00
requirements.txt	Initial import of lidl-monitor	2026-04-20 12:07:01 +02:00

README.md

lidl-monitor

Self-hosted scraper for the Lidl Slovenia weekly catalog. Pulls the full SI assortment twice a week, keeps a forever archive in SQLite, exposes a small FastAPI UI + RSS feed, and pushes Gotify notifications when new products match your keywords.

FastAPI + SQLite + APScheduler, packaged as a single Docker container.

Quick deploy

The expected deployment is a Linux host (any architecture Docker supports) exposed to you over Tailscale, typically with Tailscale Funnel if you want the RSS feed and links reachable from the public internet (e.g. so your phone's RSS reader can hit it when off-tailnet).

1. Clone and configure

git clone <your-fork-url> lidl-monitor
cd lidl-monitor
cp .env.example .env

Edit .env:

Key	What it does
`BASE_URL`	External URL the app is reachable at. Used for RSS `<link>` tags and Gotify click-through. On a Tailscale Funnel setup this is your `https://<host>.<tailnet>.ts.net` name.
`GOTIFY_URL`, `GOTIFY_TOKEN`	Your Gotify server + app token. Leave both empty to disable push entirely.
`GOTIFY_PRIORITY`	1–10; 5 is a normal notification.
`RUN_DAYS`, `RUN_HOUR`, `RUN_MINUTE`, `TIMEZONE`	Scrape schedule. Defaults (`mon,thu 06:00 Europe/Ljubljana`) match Lidl SI's rotation.
`LIDL_LOCALE`, `LIDL_ASSORTMENT`	Leave as `sl_SI` / `SI` unless scraping another Lidl region.

Don't commit .env — .gitignore already excludes it. The SQLite database (data/lidl.db) is tracked, so the catalog history travels with the repo.

2. Start the container

docker compose up --build -d
docker compose logs -f    # tail

The app listens on http://<host>:8000. data/ is bind-mounted into the container at /data, so the database survives rebuilds.

Sanity check:

curl http://localhost:8000/healthz           # {"ok": true}
curl -X POST http://localhost:8000/admin/refresh  # triggers a scrape

3. Expose with Tailscale Funnel (optional)

If the box is already in your tailnet:

# Make the service reachable to everyone on your tailnet.
sudo tailscale serve --bg --https=443 http://localhost:8000

# Or expose it to the public internet (requires Funnel enabled in the
# admin console for this node).
sudo tailscale funnel --bg --https=443 http://localhost:8000

Then set BASE_URL=https://<host>.<tailnet>.ts.net in .env and restart:

docker compose up -d

RSS subscribers (e.g. in an RSS reader on your phone) can now hit https://<host>.<tailnet>.ts.net/feed.xml from anywhere.

4. Add keywords

Open /keywords in a browser, add substring or regex patterns. The /keywords page shows a live preview of what each keyword would match in the current catalog. The home page's "Latest by keyword" section shows active keyword hits in the current catalog, re-evaluated on every load — so new keywords work immediately without waiting for the next scrape.

New keyword hits produce a Gotify push on the next scrape run. Pushes are idempotent: restart-safe via the match.notified_at column.

Day-to-day

docker compose logs -f                        # follow logs
docker compose restart                        # after changing .env only
docker compose up --build -d                  # after changing code
curl -X POST http://localhost:8000/admin/refresh   # scrape on demand

A typical scrape logs 5 paged HTTP requests + fetched N products. If you see a 401 after page 1, the Lidl WAF (Myra Cloud) rejected the request — check the Accept header in fetcher.py. See CLAUDE.md for the gory details.

Endpoints

GET / — home: week list, search box, latest hits per keyword.
GET /search?q=… — substring search across the full archive.
GET /w/{iso_year}/{iso_week} — one week's products + flyer PDF/thumbnail.
GET /p/{product_id} — product detail + sighting history.
GET /feed.xml — RSS of new products. Optional ?q= filter.
GET /keywords — keyword CRUD + live match preview.
POST /admin/refresh — trigger a scrape now.
GET /healthz

Data

Everything lives in data/lidl.db (SQLite, WAL-mode). Schema:

run — one row per scrape cycle.
product — one row per productId ever seen. Never deleted.
sighting — many-to-many of runs × products, with is_new flag and per-run price.
keyword, match — user keywords and the products they've matched (with notified_at tracking Gotify delivery).
flyer — one row per ISO week with the decorative leaflet PDF + thumbnail.

Because the DB is committed, git log data/lidl.db gives you a timeline of scrape snapshots.

README.md Unescape Escape