Skip to content

fix: prevent unnecessary full reindex when Qdrant collection already exists#12147

Draft
roomote-v0[bot] wants to merge 1 commit intomainfrom
fix/prevent-unnecessary-reindex-12145
Draft

fix: prevent unnecessary full reindex when Qdrant collection already exists#12147
roomote-v0[bot] wants to merge 1 commit intomainfrom
fix/prevent-unnecessary-reindex-12145

Conversation

@roomote-v0
Copy link
Copy Markdown
Contributor

@roomote-v0 roomote-v0 bot commented Apr 18, 2026

This PR attempts to address Issue #12145. Feedback and guidance are welcome.

Problem

Users with persistent Qdrant storage were seeing a full reindex every time they reopened VS Code, even though the collection and its data were intact in Qdrant.

Root Causes

  1. getCollectionInfo() treated ALL errors as "collection not found" - Connection failures, timeouts, and other transient errors were caught and returned as null, making initialize() think the collection did not exist and create a new one.

  2. Aggressive error cleanup destroyed valid data - The orchestrator error handler called clearCollection() + clearCacheFile() on any indexing error, even transient ones on an existing collection with valid data.

  3. hasIndexedData() returned false on connection errors - Same issue as Add options to always approve write and execute operations #1, causing the orchestrator to skip the incremental scan path and do a full scan.

Changes

src/services/code-index/vector-store/qdrant-client.ts

  • Added isNotFoundError() helper to distinguish 404 (collection missing) from other errors
  • getCollectionInfo() now returns null only for 404 errors; propagates other errors
  • hasIndexedData() lets connection errors propagate instead of silently returning false
  • collectionExists() inherits the same error propagation

src/services/code-index/orchestrator.ts

  • Error handler now checks collectionCreated flag to decide cleanup behavior
  • New collection (just created): clear collection + cache as before
  • Existing collection (pre-existing data): flush/persist cache instead of clearing, skip clearCollection() to preserve the user's existing index
  • Connection failures before indexing starts: preserve cache for future incremental scan (unchanged)

Tests updated

  • qdrant-client.spec.ts: Non-404 errors now throw instead of creating a collection
  • orchestrator.spec.ts: Split error-cleanup test into two cases (new vs existing collection)

Fixes #12145

Interactively review PR in Roo Code Cloud

…exists

Root cause: getCollectionInfo() treated ALL errors (including connection
failures and timeouts) as "collection not found", causing initialize()
to create a new collection even when one already existed with valid data.
Additionally, the error handler in the orchestrator aggressively cleared
both the collection and cache on any indexing error, destroying existing
indexed data.

Changes:
- getCollectionInfo(): Only return null for 404 (not found) errors;
  propagate other errors (connection failures, timeouts) so callers
  can distinguish "missing collection" from "unreachable Qdrant"
- hasIndexedData(): Let connection errors propagate instead of silently
  returning false (which triggered full reindex)
- collectionExists(): Same error propagation improvement
- Orchestrator error handler: Only clear collection + cache when the
  collection was just created (no pre-existing data to preserve). For
  existing collections, flush/persist the cache instead of clearing it
  so incremental scans can resume on next startup.

Fixes #12145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Roo code reindex my large codebase everytime I open my repo

1 participant