---
canonical: "https://yuanhaochen.dev/work/tum-search"
path: "/work/tum-search"
section: "Work"
title: "tum-search"
language: "en"
agentUse: "summary, retrieval, citation, hiring evaluation"
---

# tum-search

Explores campus knowledge search as a system problem: crawling, summaries, embeddings, graph structure, and live update feedback.

Why this article exists

Search is a good test of whether a system can respect both structure and intent. This project explores how university knowledge can be crawled, summarized, embedded, connected, and updated without treating ranking as only keyword matching.

Problem

Campus knowledge search needs more than text lookup. It needs recursive crawling, concise page summaries, semantic retrieval, graph relationships, freshness signals, and visible progress when the index changes.

What shipped

Crawler, Gemini-powered summaries, Qdrant/CLIP vector search, knowledge-graph ideas, WebSocket crawl progress, dependency checks, setup scripts, and admin utilities.

Evidence

The README documents the crawler, summarization, vector-search, knowledge-graph, WebSocket update, setup, environment, and admin-tool surfaces.

Inspect path

Inspect the README, `web_server.py`, dependency scripts, crawler/summarization paths, Qdrant configuration, WebSocket update path, and admin scripts for database clearing and summary regeneration.

Boundary

The public README exposes a research/prototype search system, not a production campus search service, validated ranking benchmark, or official university information product.

What changed

Search quality became a systems question: topology, semantics, generated summaries, and update feedback matter together before ranking claims are credible.

Next question

Which signal should be trusted first when graph structure, semantic similarity, freshness, and keyword match disagree?

Open public repository

https://github.com/89325516/tum-search