Skip to content

feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31

Open
solderzzc wants to merge 178 commits intomainfrom
feature/swiftbuddy-mempalace-v1
Open

feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31
solderzzc wants to merge 178 commits intomainfrom
feature/swiftbuddy-mempalace-v1

Conversation

@solderzzc
Copy link
Copy Markdown
Member

Pull Request

Title: feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization

🚀 Summary

This PR merges the entire feature/swiftbuddy-mempalace-v1 architecture into main. This milestone finalizes our end-to-end multimodal inference pipeline over Gemma 4 (Text + Audio + Vision), heavily refines the SwiftBuddy interface natively for macOS, replaces the legacy persona integration pipeline with lightning-fast RAG vector synthesis (Memory Palace), and hardens the core local-inference framework to cleanly package zero-secret GitHub Action releases.

🧠 Core Architectural Upgrades

  • Gemma 4 Omni Pipeline Stabilization: Completely resolved the upstream DraftModelRef speculative-decoding integration failures in Server.swift. Re-exposed the critical Gemma4Configuration payload structs so that MLXVLM processing can successfully bind scaling integers across the audio array ingestion layers. Native inputEmbedding overrides are permanently restored!
  • Memory Palace Vector Migration: Deprecated the slow LLM-dependent GraphPalaceService in favor of precise, deterministic semantic searches using MemoryPalaceService.synthesizePersonaIndex. To secure massive compute continuity, RAG directives are forcefully appended into dynamic UI states to avoid fracturing identical MLX KV-cache hits!
  • SSD Inference & Model Loading: Rolled out the definitive Hot Expert LRU Cache to eliminate destructive sequential pread() blocking when sliding across enormous MoE configurations.
  • Zero-Secret DMG Deployment: Upgraded GitHub runners to seamlessly package, CodeSign, and natively Notarize the Apple Silicon app distribution endpoints dynamically using Info.plist data.

🎨 Layout & Discovery Aesthetics (SwiftBuddy)

  • Ancient RPG Persona UX: Immersive visual novel elements added to character bonding, layered explicitly beneath system Toolbar bounds for native UI fluidity. Trapped NSDetectedLayoutRecursion rendering faults inherently caused by Apple's TextEditor SwiftUI boundaries.
  • HuggingFace Pipeline Integrity: Dynamically fetches byte storage sizes with native ByteCountFormatter, traps undocumented GGUF metadata returns, and natively embeds offline-verified "Interactive Summoning" states directly into individual model rows.

🧪 Pipeline Validation

  • DSP math fully stabilized via SwiftLMTestSTFT harness evaluation points.
  • Zero compilation crashes recorded under swift build -c release.
  • System Roles definitively translated out of Gemma Jinja templates to prevent echoing and linguistic exhaustion.

solderzzc added 30 commits April 7, 2026 18:53
- Injected export pipeline guaranteeing MLX metal library initialization hooks bypass Github Action test environments natively
- Introduced currentWing target on ChatViewModel for persona routing
- Intercepted userText explicitly searching SwiftData native memories
- Pre-pended retrieved factual context invisibly inside system prompts ensuring zero-latency, 100% stable context retention across all dumb models seamlessly
…rference and make downloaded models directly tappable to load
…o prevent macOS layout recursion crashes resulting in blank models
…cursive background querying for HF Hub discovery
…skeleton constraints for HuggingFace Hub modal layout
…ng to RegistryService to trace GitHub API access drops
…hed persona.json and statically request known room txt files
… WAL transaction flooding during massive persona corpus ingestion
…oops by converting TextEditor blocks to vertical TextFields inside iOS/macOS active ScrollViews
…ine and introduce Native graphical Map hierarchy for memory rooms
…natively into ChatView toolbars for RAG identity mapping
…tly reflect the currently selected memory persona wing
…try and pivot root Navigation to a primary Friends List model
…ectures by forcefully prepending RAG variables linearly against raw User instructions rather than allocating hostile System Role bounds
solderzzc and others added 30 commits April 12, 2026 16:34
This commit ties the SwiftLM root back to the stabilized mlx-swift-lm papps-ssd-streaming branch containing the architectural RoPE phase bug fixes and LM head initialization fixes. It also updates Server.swift routing logically to match the Omni input payloads.
…o tower

The ALMModelFactory used LLMModelFactory (text-only) which never loaded the
Gemma4 audio tower or extracted mel features. Switching to OmniModelFactory
ensures the VLM weights (audio_tower, embed_audio) are loaded and Gemma4's
native prepareForMultimodal path extracts audio features into LMInput.audio,
enabling real audio grounding instead of mock token interleaving.
Replace SoundHelix MP3 download with macOS 'say' TTS for a reproducible,
network-independent speech transcription test. Update Turn 2 closed-loop
question to 'what animal is mentioned' (tests context reasoning on 'fox')
instead of music-specific instrument question.
… quants

- ALMModelFactory._load now delegates to VLMModelFactory instead of
  LLMModelFactory, ensuring the Gemma4 audio tower and VLM weights are
  loaded for both 4-bit and 8-bit quantizations.
- Add port-drain wait + log truncation before starting ALM test server
  to prevent health check from hitting a stale VLM process on port 5431.
- ALM payloads: bump max_tokens 100→500 and add thinking:false to
  suppress reasoning chain tokens in transcription output.
- Benchmark: update Turn 2 closed-loop question to test factual
  comprehension of the transcribed speech content.
ThinkingStateTracker (Server.swift):
- Add <|channel|>thought...<channel|> Gemma4 native thinking token format
  alongside existing Qwen3/DeepSeek <thinking>...</thinking> support
- Refactor partial-match into isPartialThinkingTag() helper to avoid
  Swift type-checker timeout on long || chains

ALM benchmark (run_benchmark.sh):
- Raise max_tokens 100→500 (Turn 1) and 100→200 (Turn 2) for full output
- Add enable_thinking=false to both payloads to disable CoT at request level
- Update prompts: 'Transcribe word for word, output only transcription'
- Turn 2 closed-loop: 'summarize what the speaker said' (model quality test)
- Strip thinking blocks in bash result extraction as belt-and-suspenders fallback
The error was a combination of two bugs:
1. gen_tokens=0 from a stale server left behind between builds (not a
   code bug — killall SwiftLM in the benchmark now reliably cleans up)
2. Python extractor treated empty content (gen_tokens=0) as 'ERROR' and
   aborted, masking the real underlying issue with a misleading message

Fixes:
- Use correct regex <|channel|>thought...<channel|> (not the old broken
  lookahead pattern that never matched Gemma4's actual thinking format)
- Empty Turn 1 response now prints [WARN: gen_tokens=X, empty response]
  and continues instead of aborting the test run
- Empty Turn 2 prints [empty] to make the zero-token case observable
- Crash detection (server connection dropped) is now a distinct error
  message, separate from model-produced-empty-content
- Update README to document the SharpAI/mlx fork and streaming logic
- Resolve swift-metrics package bounds
- Fallback softly on string lengths when tokenizing prompt length sizes
- Adapt ThinkingStateTracker parsing for generic <think> tags over custom ones
- Inline ALM type logic for Whisper registration
- Resolve the onChange parameter deprecations
- Safely unroll directory enumerators with sequence unwrapping
- Use let instead of var in chat payloads to prevent mutability warnings
The prompt cache stores KV state keyed on text tokens only. When a
multimodal request (with image or audio) arrived after text-only requests
with shared prefixes (e.g. BOS, system prompt), the cache was hitting and
creating a trimmed LMInput that discarded the image/audio from LMInput.

This caused Gemma4.prepare() to see input.image == nil and input.audio == nil
even though the processor had correctly set processedImage/processedAudio,
so getInputEmbeddings() skipped the vision/audio feature injection entirely.

Symptom: Model would respond 'There is no audio clip provided to transcribe'
despite audio tokens being present in the prompt token sequence.

Fix: Add isMultimodalRequest guard before the prompt cache restore call.
Multimodal requests always take the full-prefill path so that prepare()
receives the complete LMInput with both modalities intact.
Points to fix/audio-fft-packing @ f2ef61f which contains:
- ConformerBlock residual/normalization architectural fix
- conv_norm corrected to AudioRMSNorm
- Debug diagnostics stripped

PR open at: SharpAI/mlx-swift-lm#fix/audio-fft-packing → main
Audio conformer pipeline now merged. Submodule tracks:
ea87c09 Merge pull request #14 from SharpAI/fix/audio-fft-packing
That model has audio_config=null (VLM-only, no audio tower) and will
always respond 'no audio provided' regardless of input. Added inline
comment explaining the restriction.

Also capture latest profiling results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant