feat(brain): Volltext-Suche zusaetzlich zu Semantic — Default ist jetzt Wortlich
Stefan wollte ne richtige Suche statt nur "klingt aehnlich". Beide Modi sind jetzt verfuegbar, Default ist Volltext: - 📝 Wortlich (Substring, case-insensitive ueber Title + Content + Category + Tags) — neuer Endpoint /memory/search-text. Full-Scan via Qdrant scroll, k=50. Findet "cessna" exakt im Content. Bei kleiner DB (<1000 Eintraege) unkritisch performant. - 🧠 Semantisch (Embedder + score_threshold 0.30) — bestehender /memory/search Endpoint. Findet konzeptuell verwandte Eintraege. Diagnostic UI: Dropdown neben dem Suchfeld zum Modus-Wechsel. Info-Banner zeigt klar welcher Modus aktiv ist. Warum Wortlich Default: bei kleiner DB liefert Semantic gern False Positives mit Score 0.30-0.45 fuer komplett unverwandte Begriffe (z.B. "cessna" matched "Tageslog fuehren" mit 0.43). Wortlich ist deterministisch und vermeidet das Rauschen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -181,6 +181,23 @@ def memory_pinned():
|
||||
return [MemoryOut.from_point(p) for p in store().list_pinned()]
|
||||
|
||||
|
||||
@app.get("/memory/search-text", response_model=List[MemoryOut])
|
||||
def memory_search_text(
|
||||
q: str,
|
||||
k: int = 50,
|
||||
type: Optional[str] = None,
|
||||
include_pinned: bool = True,
|
||||
):
|
||||
"""Volltext-Substring-Suche (case-insensitive) ueber Title + Content +
|
||||
Category + Tags. Findet exakte Begriffe — z.B. 'cessna' matched 'Cessna 172'.
|
||||
Im Gegensatz zu /memory/search (semantic) keine 'klingt aehnlich'-Treffer."""
|
||||
points = store().search_text(
|
||||
q, k=k, type_filter=type,
|
||||
exclude_pinned=not include_pinned,
|
||||
)
|
||||
return [MemoryOut.from_point(p) for p in points]
|
||||
|
||||
|
||||
@app.get("/memory/search", response_model=List[MemoryOut])
|
||||
def memory_search(
|
||||
q: str,
|
||||
|
||||
@@ -213,3 +213,56 @@ class VectorStore:
|
||||
|
||||
def count(self) -> int:
|
||||
return self.client.count(collection_name=COLLECTION, exact=True).count
|
||||
|
||||
def search_text(
|
||||
self,
|
||||
query: str,
|
||||
k: int = 20,
|
||||
type_filter: Optional[str] = None,
|
||||
exclude_pinned: bool = False,
|
||||
) -> List[MemoryPoint]:
|
||||
"""Volltext-Substring-Suche (case-insensitive) ueber Title +
|
||||
Content + Category + Tags. Im Gegensatz zu search() ist das KEIN
|
||||
Semantic-Match — nur exakte Wort-/Teilwort-Treffer.
|
||||
|
||||
Full-Scan ueber alle (gefilteren) Punkte. Bei der erwarteten
|
||||
Groessenordnung (< 1000) unkritisch."""
|
||||
q = (query or "").strip().lower()
|
||||
if not q:
|
||||
return []
|
||||
must = []
|
||||
must_not = []
|
||||
if type_filter:
|
||||
must.append(qm.FieldCondition(key="type", match=qm.MatchValue(value=type_filter)))
|
||||
if exclude_pinned:
|
||||
must_not.append(qm.FieldCondition(key="pinned", match=qm.MatchValue(value=True)))
|
||||
flt = qm.Filter(must=must or None, must_not=must_not or None) if (must or must_not) else None
|
||||
|
||||
matches: List[MemoryPoint] = []
|
||||
offset = None
|
||||
while True:
|
||||
points, offset = self.client.scroll(
|
||||
collection_name=COLLECTION,
|
||||
scroll_filter=flt,
|
||||
limit=200,
|
||||
offset=offset,
|
||||
with_payload=True,
|
||||
with_vectors=False,
|
||||
)
|
||||
for p in points:
|
||||
payload = p.payload or {}
|
||||
tags = payload.get("tags")
|
||||
tags_str = " ".join(tags) if isinstance(tags, list) else ""
|
||||
haystack = " ".join([
|
||||
str(payload.get("title", "")),
|
||||
str(payload.get("content", "")),
|
||||
str(payload.get("category", "")),
|
||||
tags_str,
|
||||
]).lower()
|
||||
if q in haystack:
|
||||
matches.append(MemoryPoint.from_qdrant(p))
|
||||
if len(matches) >= k:
|
||||
return matches
|
||||
if not offset:
|
||||
break
|
||||
return matches
|
||||
|
||||
Reference in New Issue
Block a user