Program Intro Sync is a standalone monitoring service for tracking external program description changes and matching them to OpenSIST programs.
It is intentionally separate from OpenSIST.github.io. The frontend should not own crawler logic, LLM integration, source state, or automation secrets.
- Monitor external repositories such as OpenCS, GlobalCS, and CSGrad.
- Fetch current OpenSIST program metadata and existing descriptions from the backend.
- Detect new or changed external program descriptions.
- Match external documents to OpenSIST
ProgramID. - Store monitoring state and change events in Cloudflare D1.
- Expose changed description candidates for downstream consumers.
- Never write directly to the OpenSIST production backend from the monitor.
- Do not auto-publish merged descriptions.
- Do not run LLM merge inside the monitoring core.
- Do not store backend cookies, API keys, or LLM credentials in repo files.
- Do not scrape arbitrary websites without source-specific rules and rate limits.
program-intro-sync/
README.md
wrangler.jsonc
package.json
migrations/
0001_initial.sql
docs/
AI_CODING_AGENT.md
current-status-summary.md
architecture.md
description-monitor.md
sync-pipeline.md
cloudflare-llm-research.md
llm-merge-policy.md
llm-summary-evaluation.md
data-model.md
security-and-config.md
memory/
project-memory.md
src/
config/
db/
sources/
opensist/
matching/
pipeline/
utils/The current MVP implements the data monitor only. LLM merge, draft storage, review UI, and backend publishing are intentionally outside this service.
Cloudflare Worker Cron/manual trigger
-> fetch OpenSIST programs/descriptions
-> fetch external sources
-> parse source documents
-> compute content hashes
-> match source docs to ProgramID
-> detect new/changed documents
-> write source state and change events to D1The monitoring core only records facts: source content hashes, match results, and change events. Data fetching and LLM merge are separate modules connected through stored D1 rows.
The first scan may need multiple monitor runs. To stay inside Cloudflare Workers subrequest limits, the monitor batch-writes OpenSIST snapshots and downloads at most MAX_RAW_DOWNLOADS_PER_SOURCE new or changed Markdown files per source per run.
Install dependencies:
npm installThe Cloudflare account and D1 database are already configured in wrangler.jsonc:
account_id = c1e0d935e0f8ba4685b9b6702130efe7
D1 database_name = program-intro-sync
D1 database_id = 3073eab7-e1f8-4e1e-b171-33740db9ad20
MAX_PROGRAM_UPSERTS_PER_RUN = 25
MAX_RAW_DOWNLOADS_PER_SOURCE = 1For a fresh Cloudflare account, create a database and replace database_id in wrangler.jsonc:
npx wrangler d1 create program-intro-syncApply local or remote migrations:
npm run db:migrate:local
npm run db:migrate:remoteSet secrets:
wrangler secret put OPENSIST_COOKIE
wrangler secret put GITHUB_TOKEN
wrangler secret put ADMIN_TOKENRun locally:
npm run devDeploy:
npm run deployAdmin endpoints:
GET /health
POST /admin/monitor/run
GET /admin/monitor/runs?limit=20
GET /admin/monitor/events?status=pending&limit=100
POST /admin/monitor/events/:id/acknowledge
POST /admin/monitor/events/:id/ignore
POST /admin/monitor/events/:id/consumeIf ADMIN_TOKEN is set, admin routes require:
Authorization: Bearer <ADMIN_TOKEN>- AI coding agent guide: start here when asking an AI agent to work on this repo.
- Current status summary: concise snapshot of what is deployed, what works, and what should be fixed next.
- Project memory: persistent context, decisions, and current state.
- Architecture: boundaries, modules, and storage policy.
- Description monitor: monitor scope, event types, and API shape.
- Sync pipeline: step-by-step runtime flow.
- Data model: D1 tables and runtime data shapes.
- Security and config: secrets, logging, license, and rate-limit rules.
- Cloudflare LLM research: Workers AI, AI Gateway, and pricing notes.
- LLM summary evaluation: local-only prompt experiments on existing D1 data.
- LLM merge policy: future DeepSeek/Qwen merge constraints.
- Backend publish API: proposed replacement for PR-based publishing.