feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization by solderzzc · Pull Request #31 · SharpAI/SwiftLM

solderzzc · 2026-04-12T18:02:40Z

Pull Request

Title: feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization

🚀 Summary

This PR merges the entire feature/swiftbuddy-mempalace-v1 architecture into main. This milestone finalizes our end-to-end multimodal inference pipeline over Gemma 4 (Text + Audio + Vision), heavily refines the SwiftBuddy interface natively for macOS, replaces the legacy persona integration pipeline with lightning-fast RAG vector synthesis (Memory Palace), and hardens the core local-inference framework to cleanly package zero-secret GitHub Action releases.

🧠 Core Architectural Upgrades

Gemma 4 Omni Pipeline Stabilization: Completely resolved the upstream DraftModelRef speculative-decoding integration failures in Server.swift. Re-exposed the critical Gemma4Configuration payload structs so that MLXVLM processing can successfully bind scaling integers across the audio array ingestion layers. Native inputEmbedding overrides are permanently restored!
Memory Palace Vector Migration: Deprecated the slow LLM-dependent GraphPalaceService in favor of precise, deterministic semantic searches using MemoryPalaceService.synthesizePersonaIndex. To secure massive compute continuity, RAG directives are forcefully appended into dynamic UI states to avoid fracturing identical MLX KV-cache hits!
SSD Inference & Model Loading: Rolled out the definitive Hot Expert LRU Cache to eliminate destructive sequential pread() blocking when sliding across enormous MoE configurations.
Zero-Secret DMG Deployment: Upgraded GitHub runners to seamlessly package, CodeSign, and natively Notarize the Apple Silicon app distribution endpoints dynamically using Info.plist data.

🎨 Layout & Discovery Aesthetics (SwiftBuddy)

Ancient RPG Persona UX: Immersive visual novel elements added to character bonding, layered explicitly beneath system Toolbar bounds for native UI fluidity. Trapped NSDetectedLayoutRecursion rendering faults inherently caused by Apple's TextEditor SwiftUI boundaries.
HuggingFace Pipeline Integrity: Dynamically fetches byte storage sizes with native ByteCountFormatter, traps undocumented GGUF metadata returns, and natively embeds offline-verified "Interactive Summoning" states directly into individual model rows.

🧪 Pipeline Validation

DSP math fully stabilized via SwiftLMTestSTFT harness evaluation points.
Zero compilation crashes recorded under swift build -c release.
System Roles definitively translated out of Gemma Jinja templates to prevent echoing and linguistic exhaustion.

- Injected export pipeline guaranteeing MLX metal library initialization hooks bypass Github Action test environments natively

- Introduced currentWing target on ChatViewModel for persona routing - Intercepted userText explicitly searching SwiftData native memories - Pre-pended retrieved factual context invisibly inside system prompts ensuring zero-latency, 100% stable context retention across all dumb models seamlessly

…teObject lifecycle bug

…rference and make downloaded models directly tappable to load

…hat bypasses token.isThinking flag

…for finalized chat messages

…r via HubApi for SwiftBuddy

…ded catalog

… MLX/GGUF formatting for Hub queries

…ing the Search UI model list

…o prevent macOS layout recursion crashes resulting in blank models

…cursive background querying for HF Hub discovery

…ative Hub cursor pagination

…skeleton constraints for HuggingFace Hub modal layout

… recursion crash

…tize absolute cached file size in row view

…rView bounds

…ng to RegistryService to trace GitHub API access drops

…hed persona.json and statically request known room txt files

…g preventing successful 404 recovery

… WAL transaction flooding during massive persona corpus ingestion

…lts on SwiftBuddyApp boot sequence

…oops by converting TextEditor blocks to vertical TextFields inside iOS/macOS active ScrollViews

…ine and introduce Native graphical Map hierarchy for memory rooms

…or teardown on macOS modal sheets

…natively into ChatView toolbars for RAG identity mapping

…tly reflect the currently selected memory persona wing

…try and pivot root Navigation to a primary Friends List model

…ectures by forcefully prepending RAG variables linearly against raw User instructions rather than allocating hostile System Role bounds

…XLMCommon import

…igurations

…am verification

…on on CI

…s/checkout fatal error

This commit ties the SwiftLM root back to the stabilized mlx-swift-lm papps-ssd-streaming branch containing the architectural RoPE phase bug fixes and LM head initialization fixes. It also updates Server.swift routing logically to match the Omni input payloads.

…ns natively to prevent padding loops

…ving overflow on M1/M2

…aming and Auto-Release CI

…ware test metrics

…ation fixes

…o tower The ALMModelFactory used LLMModelFactory (text-only) which never loaded the Gemma4 audio tower or extracted mel features. Switching to OmniModelFactory ensures the VLM weights (audio_tower, embed_audio) are loaded and Gemma4's native prepareForMultimodal path extracts audio features into LMInput.audio, enabling real audio grounding instead of mock token interleaving.

Replace SoundHelix MP3 download with macOS 'say' TTS for a reproducible, network-independent speech transcription test. Update Turn 2 closed-loop question to 'what animal is mentioned' (tests context reasoning on 'fox') instead of music-specific instrument question.

… quants - ALMModelFactory._load now delegates to VLMModelFactory instead of LLMModelFactory, ensuring the Gemma4 audio tower and VLM weights are loaded for both 4-bit and 8-bit quantizations. - Add port-drain wait + log truncation before starting ALM test server to prevent health check from hitting a stale VLM process on port 5431. - ALM payloads: bump max_tokens 100→500 and add thinking:false to suppress reasoning chain tokens in transcription output. - Benchmark: update Turn 2 closed-loop question to test factual comprehension of the transcribed speech content.

ThinkingStateTracker (Server.swift): - Add <|channel|>thought...<channel|> Gemma4 native thinking token format alongside existing Qwen3/DeepSeek <thinking>...</thinking> support - Refactor partial-match into isPartialThinkingTag() helper to avoid Swift type-checker timeout on long || chains ALM benchmark (run_benchmark.sh): - Raise max_tokens 100→500 (Turn 1) and 100→200 (Turn 2) for full output - Add enable_thinking=false to both payloads to disable CoT at request level - Update prompts: 'Transcribe word for word, output only transcription' - Turn 2 closed-loop: 'summarize what the speaker said' (model quality test) - Strip thinking blocks in bash result extraction as belt-and-suspenders fallback

The error was a combination of two bugs: 1. gen_tokens=0 from a stale server left behind between builds (not a code bug — killall SwiftLM in the benchmark now reliably cleans up) 2. Python extractor treated empty content (gen_tokens=0) as 'ERROR' and aborted, masking the real underlying issue with a misleading message Fixes: - Use correct regex <|channel|>thought...<channel|> (not the old broken lookahead pattern that never matched Gemma4's actual thinking format) - Empty Turn 1 response now prints [WARN: gen_tokens=X, empty response] and continues instead of aborting the test run - Empty Turn 2 prints [empty] to make the zero-token case observable - Crash detection (server connection dropped) is now a distinct error message, separate from model-produced-empty-content

…room

- Update README to document the SharpAI/mlx fork and streaming logic - Resolve swift-metrics package bounds

- Fallback softly on string lengths when tokenizing prompt length sizes - Adapt ThinkingStateTracker parsing for generic <think> tags over custom ones - Inline ALM type logic for Whisper registration

- Resolve the onChange parameter deprecations - Safely unroll directory enumerators with sequence unwrapping - Use let instead of var in chat payloads to prevent mutability warnings

The prompt cache stores KV state keyed on text tokens only. When a multimodal request (with image or audio) arrived after text-only requests with shared prefixes (e.g. BOS, system prompt), the cache was hitting and creating a trimmed LMInput that discarded the image/audio from LMInput. This caused Gemma4.prepare() to see input.image == nil and input.audio == nil even though the processor had correctly set processedImage/processedAudio, so getInputEmbeddings() skipped the vision/audio feature injection entirely. Symptom: Model would respond 'There is no audio clip provided to transcribe' despite audio tokens being present in the prompt token sequence. Fix: Add isMultimodalRequest guard before the prompt cache restore call. Multimodal requests always take the full-prefill path so that prepare() receives the complete LMInput with both modalities intact.

Points to fix/audio-fft-packing @ f2ef61f which contains: - ConformerBlock residual/normalization architectural fix - conv_norm corrected to AudioRMSNorm - Debug diagnostics stripped PR open at: SharpAI/mlx-swift-lm#fix/audio-fft-packing → main

Audio conformer pipeline now merged. Submodule tracks: ea87c09 Merge pull request #14 from SharpAI/fix/audio-fft-packing

That model has audio_config=null (VLM-only, no audio tower) and will always respond 'no audio provided' regardless of input. Added inline comment explaining the restriction. Also capture latest profiling results.

solderzzc added 30 commits April 7, 2026 18:53

ci: secure MLX_METAL_PATH for E2E tests dynamically

eb853c8

- Injected export pipeline guaranteeing MLX metal library initialization hooks bypass Github Action test environments natively

fix(gui): Resolve SwiftBuddy HuggingFace search blank list due to Sta…

dce2503

…teObject lifecycle bug

fix(gui): Resolve ModelManagementView SwiftUI sheet presentation inte…

7b7e281

…rference and make downloaded models directly tappable to load

chore: Add diagnostic logging to HFModelSearch

6f4f132

fix(chat): Improve extraction of raw <think> tags from model output t…

3f2a374

…hat bypasses token.isThinking flag

feat(chat): Persist generated reasoning in a collapsed ThinkingPanel …

8001126

…for finalized chat messages

fix(macOS): Re-enable native model downloading in ModelDownloadManage…

2b7c1dc

…r via HubApi for SwiftBuddy

fix(models): Allow arbitrary Hugging Face models to appear in downloa…

b57e22e

…ded catalog

feat(search): Dynamically fetch exact model storage sizes and display…

cc6aa31

… MLX/GGUF formatting for Hub queries

fix(search): Prevent Hugging Face rate limits from abruptly hard-fail…

6b06d03

…ing the Search UI model list

fix(search): Refactor HFSearchTab to use ScrollView instead of List t…

c0de485

…o prevent macOS layout recursion crashes resulting in blank models

feat(search): Implement horizontal parameter size filtering UI and re…

4e440c9

…cursive background querying for HF Hub discovery

fix(search): Support M-scale parameter size filtering and implement n…

57b9f93

…ative Hub cursor pagination

style(ui): Remove model picker auto-open on launch and enforce rigid …

35c5190

…skeleton constraints for HuggingFace Hub modal layout

fix(search): Exclude GGUF models implicitly and fix CatalogTab layout…

8064be6

… recursion crash

style(search): Move Hub search header to top of model list and priori…

dcd6f82

…tize absolute cached file size in row view

feat(mempalace): Mock registry personas and implement strict Inspecto…

cd0c448

…rView bounds

chore(networking): Add User-Agent headers and explicit response loggi…

0e18a56

…ng to RegistryService to trace GitHub API access drops

feat(mempalace): Refactor RegistryService fetching to rely on CDN-cac…

c304f45

…hed persona.json and statically request known room txt files

fix(mempalace): Bypass aggressive internal URLSession CDN edge cachin…

c0fa8ed

…g preventing successful 404 recovery

perf(mempalace): Implement batch saveMemories API to prevent CoreData…

ec87d79

… WAL transaction flooding during massive persona corpus ingestion

feat(engine): Automatically load the last active model from UserDefau…

a8d2b0d

…lts on SwiftBuddyApp boot sequence

fix(ui): Eliminate NSDetectedLayoutRecursion infinite AppKit redraw l…

ac2a9e2

…oops by converting TextEditor blocks to vertical TextFields inside iOS/macOS active ScrollViews

feat(mempalace): Implement Wake-Up Persona integration into MLX pipel…

7ed0f6e

…ine and introduce Native graphical Map hierarchy for memory rooms

fix(ui): Add explicit window dismiss button to PalaceVisualizerView f…

cc9fe0a

…or teardown on macOS modal sheets

feat(ui): Add explicit swiftData driven Persona wing picker injected …

965c2cc

…natively into ChatView toolbars for RAG identity mapping

feat(ui): Bind chat navigation title and input placeholder to explici…

beec023

…tly reflect the currently selected memory persona wing

feat(ui): Decouple Text Ingestion / Memory Miner from Inspector Regis…

a8db7cd

…try and pivot root Navigation to a primary Friends List model

fix(ml): Resolve Jinja Template constraint failures with Gemma archit…

0376eb8

…ectures by forcefully prepending RAG variables linearly against raw User instructions rather than allocating hostile System Role bounds

solderzzc and others added 30 commits April 12, 2026 16:34

chore: sync mlx-swift-lm with omni overlap resolution

f4f9f1a

fix: resolve AudioProcessor scope error in tests by adding missing ML…

449b3a3

…XLMCommon import

chore(deps): update package resolutions, submodule maps, and git conf…

ed16655

…igurations

test(audio): add STFT ground truth generator for native mel-spectrogr…

fc35a8f

…am verification

feat(tooling): add SwiftBuddy TDD execution harness

8e40b63

fix: change checkout to recursive for mlx-swift-lm submodule resoluti…

a8bbee2

…on on CI

fix: remove non-existent swiftlm-landing submodule that caused action…

97ca94c

…s/checkout fatal error

chore: sync mlx-swift-lm with MoE generic loader fix

d514c8f

chore: sync mlx-swift-lm with MoE generic loader fix (scale param patch)

8537db8

chore(test): archive gemma4 investigation scripts to sandbox

290d599

chore: track latest mlx-swift-lm ssd-streaming commits

a7eb25f

fix: correctly handle Gemma 4 and Llama 3 custom completion turn toke…

5fc3e13

…ns natively to prevent padding loops

chore(gemma4): submodule bump to include float32 native RMSNorm resol…

6c7059e

…ving overflow on M1/M2

chore(submodule): bump mlx-swift-lm to latest main including SSD Stre…

ad07979

…aming and Auto-Release CI

bench: commit speculative decoding automated harness and macbook hard…

b2bcfa1

…ware test metrics

chore: bump mlx-swift-lm submodule to integrate Gemma 4 Omni stabiliz…

dbd7ba8

…ation fixes

test(alm): increase Turn 1 max_tokens 100→500 for full transcription …

e3f09d0

…room

chore: update documentation and package resolutions

12ee779

- Update README to document the SharpAI/mlx fork and streaming logic - Resolve swift-metrics package bounds

refactor(core): streamline inference and thinking tag parsers

214a8c1

- Fallback softly on string lengths when tokenizing prompt length sizes - Adapt ThinkingStateTracker parsing for generic <think> tags over custom ones - Inline ALM type logic for Whisper registration

fix(ui): adopt Swift 6 concurrency and SwiftUI parameter changes

da8d44c

- Resolve the onChange parameter deprecations - Safely unroll directory enumerators with sequence unwrapping - Use let instead of var in chat payloads to prevent mutability warnings

chore: sync mlx-swift-lm submodule to main @ b397 (PR #14)

a20d63d

Audio conformer pipeline now merged. Submodule tracks: ea87c09 Merge pull request #14 from SharpAI/fix/audio-fft-packing

fix(benchmark): remove gemma-4-26b-a4b from ALM/Omni model list

299b462

That model has audio_config=null (VLM-only, no audio tower) and will always respond 'no audio provided' regardless of input. Added inline comment explaining the restriction. Also capture latest profiling results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31

feat: SwiftBuddy Memory Palace V1 & Omni Audio Edge Stabilization#31
solderzzc wants to merge 178 commits intomainfrom
feature/swiftbuddy-mempalace-v1

solderzzc commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solderzzc commented Apr 12, 2026

Pull Request

🚀 Summary

🧠 Core Architectural Upgrades

🎨 Layout & Discovery Aesthetics (SwiftBuddy)

🧪 Pipeline Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant