Self-hosted Lidl SI weekly catalog scraper with RSS feed + Gotify push notifications
  • Python 80.4%
  • HTML 14.2%
  • CSS 5%
  • Dockerfile 0.4%
Find a file
maks 4f66ceb27b
All checks were successful
CI / lint-and-test (push) Successful in 47s
Apply ruff 0.15 format pass
Pure reformatting from a newer ruff release that tightened a few
line-collapse rules; tests untouched. Restores `ruff format --check`
to green so the CI workflow at .forgejo/workflows/ci.yml passes again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:36:58 +02:00
.forgejo/workflows Add test suite, pre-commit config, and Forgejo CI 2026-04-20 12:27:19 +02:00
data Add deployment README, monitor page, and run-log schedule admin 2026-04-20 12:13:29 +02:00
src/lidl_monitor Apply ruff 0.15 format pass 2026-05-16 19:36:58 +02:00
tests Apply ruff 0.15 format pass 2026-05-16 19:36:58 +02:00
.env.example Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00
.gitignore Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00
.pre-commit-config.yaml Add test suite, pre-commit config, and Forgejo CI 2026-04-20 12:27:19 +02:00
CLAUDE.md Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00
docker-compose.yml Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00
Dockerfile Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00
pyproject.toml Add test suite, pre-commit config, and Forgejo CI 2026-04-20 12:27:19 +02:00
README.md Add deployment README, monitor page, and run-log schedule admin 2026-04-20 12:13:29 +02:00
requirements.txt Initial import of lidl-monitor 2026-04-20 12:07:01 +02:00

lidl-monitor

Self-hosted scraper for the Lidl Slovenia weekly catalog. Pulls the full SI assortment twice a week, keeps a forever archive in SQLite, exposes a small FastAPI UI + RSS feed, and pushes Gotify notifications when new products match your keywords.

FastAPI + SQLite + APScheduler, packaged as a single Docker container.

Quick deploy

The expected deployment is a Linux host (any architecture Docker supports) exposed to you over Tailscale, typically with Tailscale Funnel if you want the RSS feed and links reachable from the public internet (e.g. so your phone's RSS reader can hit it when off-tailnet).

1. Clone and configure

git clone <your-fork-url> lidl-monitor
cd lidl-monitor
cp .env.example .env

Edit .env:

Key What it does
BASE_URL External URL the app is reachable at. Used for RSS <link> tags and Gotify click-through. On a Tailscale Funnel setup this is your https://<host>.<tailnet>.ts.net name.
GOTIFY_URL, GOTIFY_TOKEN Your Gotify server + app token. Leave both empty to disable push entirely.
GOTIFY_PRIORITY 110; 5 is a normal notification.
RUN_DAYS, RUN_HOUR, RUN_MINUTE, TIMEZONE Scrape schedule. Defaults (mon,thu 06:00 Europe/Ljubljana) match Lidl SI's rotation.
LIDL_LOCALE, LIDL_ASSORTMENT Leave as sl_SI / SI unless scraping another Lidl region.

Don't commit .env.gitignore already excludes it. The SQLite database (data/lidl.db) is tracked, so the catalog history travels with the repo.

2. Start the container

docker compose up --build -d
docker compose logs -f    # tail

The app listens on http://<host>:8000. data/ is bind-mounted into the container at /data, so the database survives rebuilds.

Sanity check:

curl http://localhost:8000/healthz           # {"ok": true}
curl -X POST http://localhost:8000/admin/refresh  # triggers a scrape

3. Expose with Tailscale Funnel (optional)

If the box is already in your tailnet:

# Make the service reachable to everyone on your tailnet.
sudo tailscale serve --bg --https=443 http://localhost:8000

# Or expose it to the public internet (requires Funnel enabled in the
# admin console for this node).
sudo tailscale funnel --bg --https=443 http://localhost:8000

Then set BASE_URL=https://<host>.<tailnet>.ts.net in .env and restart:

docker compose up -d

RSS subscribers (e.g. in an RSS reader on your phone) can now hit https://<host>.<tailnet>.ts.net/feed.xml from anywhere.

4. Add keywords

Open /keywords in a browser, add substring or regex patterns. The /keywords page shows a live preview of what each keyword would match in the current catalog. The home page's "Latest by keyword" section shows active keyword hits in the current catalog, re-evaluated on every load — so new keywords work immediately without waiting for the next scrape.

New keyword hits produce a Gotify push on the next scrape run. Pushes are idempotent: restart-safe via the match.notified_at column.

Day-to-day

docker compose logs -f                        # follow logs
docker compose restart                        # after changing .env only
docker compose up --build -d                  # after changing code
curl -X POST http://localhost:8000/admin/refresh   # scrape on demand

A typical scrape logs 5 paged HTTP requests + fetched N products. If you see a 401 after page 1, the Lidl WAF (Myra Cloud) rejected the request — check the Accept header in fetcher.py. See CLAUDE.md for the gory details.

Endpoints

  • GET / — home: week list, search box, latest hits per keyword.
  • GET /search?q=… — substring search across the full archive.
  • GET /w/{iso_year}/{iso_week} — one week's products + flyer PDF/thumbnail.
  • GET /p/{product_id} — product detail + sighting history.
  • GET /feed.xml — RSS of new products. Optional ?q= filter.
  • GET /keywords — keyword CRUD + live match preview.
  • POST /admin/refresh — trigger a scrape now.
  • GET /healthz

Data

Everything lives in data/lidl.db (SQLite, WAL-mode). Schema:

  • run — one row per scrape cycle.
  • product — one row per productId ever seen. Never deleted.
  • sighting — many-to-many of runs × products, with is_new flag and per-run price.
  • keyword, match — user keywords and the products they've matched (with notified_at tracking Gotify delivery).
  • flyer — one row per ISO week with the decorative leaflet PDF + thumbnail.

Because the DB is committed, git log data/lidl.db gives you a timeline of scrape snapshots.