π Full Skill Source β This is the complete, unedited SKILL.md file. Nothing is hidden or summarized.
CM ReadIt β Web Audio Experience Skill β
Philosophy: Reading is passive. Listening is intimate. Voice builds trust faster than any headline. Core Principle: Zero dependencies. Progressive enhancement. Respect user's device and preferences.
π― Selective Reading Rule (MANDATORY) β
| File | Status | When to Read |
|---|---|---|
| tts-engine.md | π΄ REQUIRED | Adding TTS / read-aloud to any page |
| audio-player.md | βͺ Optional | Pre-recorded MP3 playback |
| voice-cro.md | βͺ Optional | Trigger-based voice sales / CRO |
| ui-patterns.md | βͺ Optional | Player bar & bottom sheet design |
π΄ tts-engine.md = ALWAYS READ when implementing TTS. Others = only if relevant.
Quick Decision Tree β
"I need audio on my website"
β
ββ Read article content aloud (text-to-speech)
β ββ Use: TTS Engine β tts-engine.md
β ββ Blog / article pages β Content Reader pattern
β ββ Documentation β Section Reader pattern
β ββ E-commerce β Product Description Reader pattern
β
ββ Play pre-recorded audio files (MP3/WAV)
β ββ Use: Audio Player β audio-player.md
β ββ Podcasts / interviews β Playlist pattern
β ββ Sales pitch / welcome β Triggered playback
β ββ Background ambient β Loop pattern
β
ββ Voice-based conversion optimization (CRO)
β ββ Use: Voice CRO β voice-cro.md
β ββ Landing pages β Trigger-based bottom sheet
β ββ Service pages β Per-page audio scripts
β ββ Course pages β Social proof audio
β
ββ Combination (TTS + CRO)
ββ Read tts-engine.md + voice-cro.md
ββ Ensure no conflict (TTS reader vs CRO player)π§ Core Principles (Internalize These) β
1. The 3 Audio Engines β
| Engine | API | Source | Best For |
|---|---|---|---|
| TTS Reader | SpeechSynthesis | Page text content | Blogs, articles, docs |
| Audio Player | HTMLAudioElement | Pre-recorded MP3 | Sales, podcasts, guides |
| Voice CRO | Audio + triggers | MP3 + behavior detection | Landing pages, sales |
2. Progressive Enhancement β
Feature detection β Graceful degradation β Never break the page
if (!('speechSynthesis' in window)) return; // TTS
if (!window.Audio) return; // AudioRule: Audio features are ENHANCEMENTS. The page must function 100% without them.
3. Content Extraction Principle β
Clone β Strip β Clean β Split β Speak
DON'T read the raw DOM.
DO clone, remove noise, extract clean text.Strip list (always remove before speaking):
- CTAs, promotions, ads
- Navigation, footer, sidebar
- Images, videos, iframes, SVGs
- Scripts, styles, hidden elements
- Tags, badges, metadata
4. The Chunking Problem β
Browsers have a hard limit on utterance length (~3000-5000 chars depending on browser/OS). Long text must be split into chunks.
Split Strategy:
ββ Split on sentence boundaries (. ! ? \n)
ββ Max chunk: 2500 chars (safe across all browsers)
ββ Preserve sentence integrity (never split mid-sentence)
ββ Chain chunks via onend callback5. Voice Selection Priority β
Language voices:
1. Local service voice (faster, works offline)
2. Network voice (higher quality, needs internet)
3. Any voice matching language prefix
4. null (browser default)6. Chrome Keep-Alive Bug β
β οΈ CRITICAL: Chrome silently stops SpeechSynthesis after ~15 seconds of continuous speech. This is the #1 gotcha.
// Workaround: pause/resume every 10s
setInterval(() => {
if (synth.speaking && !synth.paused) {
synth.pause();
synth.resume();
}
}, 10000);7. synth.cancel() Triggers onerror β
β οΈ GOTCHA: Calling
synth.cancel()fires theonerrorevent on any active utterance with error type'canceled'or'interrupted'.
Solution: Use a guard flag or check error type:
u.onerror = function(e) {
if (e.error === 'canceled' || e.error === 'interrupted') return;
stopReading();
};ποΈ Architecture Pattern β
Minimal TTS Reader (Copy-Paste Starting Point) β
βββββββββββββββββββββββββββββββββββββββββββ
β IIFE β
β β
β ββ Feature Detection ββ β
β β speechSynthesis? β β
β ββββββββββββ¬ββββββββββββ β
β βΌ β
β ββ Content Extraction ββ β
β β Clone β Strip β Cleanβ β
β ββββββββββββ¬βββββββββββββ β
β βΌ β
β ββ Chunking Engine βββββ β
β β Split on sentences β β
β β Max 2500 chars β β
β ββββββββββββ¬βββββββββββββ β
β βΌ β
β ββ Utterance Builder βββ β
β β Set voice/rate/pitch β β
β β Chain via onend β β
β ββββββββββββ¬βββββββββββββ β
β βΌ β
β ββ Player UI βββββββββββ β
β β Bar: play/pause/stop β β
β β Progress indicator β β
β β Trigger button β β
β ββββββββββββ¬βββββββββββββ β
β βΌ β
β ββ Keep-Alive Timer ββββ β
β β pause/resume @ 10s β β
β βββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββLifecycle β
Init β Detect β Inject Trigger Button
β
User clicks βΆ
β
Extract Text β Chunk β Build Utterances
β
synth.speak(chunk[0])
β
chunk[0].onend β speak(chunk[1]) β ... β speak(chunk[N])
β β
Keep-Alive Timer running chunk[N].onend
β β
User clicks βΈ β synth.pause() stopReading()
User clicks βΆ β synth.resume() cleanup UI
User clicks β β synth.cancel()π Implementation Checklist β
For TTS Reader β
- [ ] Feature detection (
speechSynthesisin window) - [ ] Content container identified (ID or selector)
- [ ] Strip list defined (what to remove before reading)
- [ ] Chunk size set (default 2500)
- [ ] Voice selection logic (language-specific)
- [ ] Player bar UI (play/pause/close + progress)
- [ ] Trigger button injected (topbar or floating)
- [ ] Chrome keep-alive timer (10s interval)
- [ ]
onerrorguard (handle cancel/interrupted) - [ ]
beforeunloadcleanup - [ ]
prefers-reduced-motionrespect - [ ] Mobile safe-area padding
For Audio Player β
- [ ] Audio files hosted and accessible
- [ ] Preload strategy (
noneβ load on demand) - [ ] Play/pause toggle with state management
- [ ] Progress bar with
currentTime/duration - [ ] Error handling (network, format, autoplay policy)
- [ ] Session state (dismissed = don't show again)
For Voice CRO β
- [ ] Per-page config object (delay, scroll threshold, audio URLs)
- [ ] Trigger conditions (time + scroll AND/OR interaction)
- [ ] Bottom sheet UI (icon, text, CTA, dismiss)
- [ ] Player bar UI (toggle, progress, CTA button)
- [ ] Session dismissal tracking
- [ ] Stats tracking (shown/listened/dismissed)
- [ ] No conflict with TTS Reader
β οΈ Common Pitfalls β
| Pitfall | Symptom | Fix |
|---|---|---|
| Chrome stops after 15s | Audio cuts mid-sentence | Keep-alive timer (pause/resume) |
synth.cancel() fires onerror | Settings sheet closes immediately | Guard flag or check error type |
| Voices not loaded | No voice available | Listen for voiceschanged event |
| Chunk too large | Utterance fails silently | Max 2500 chars per chunk |
| Reading CTA text | TTS reads "ΔαΊ·t Lα»ch Ngay" | Strip non-content elements |
| Autoplay blocked | Audio won't start on mobile | Require user interaction first |
| Multiple audio conflicts | TTS + CRO play simultaneously | Mutual exclusion check |
| No cleanup on nav | Audio keeps playing | beforeunload β synth.cancel() |
π Multi-Language Support β
Voice selection by language:
ββ Vietnamese: v.lang === 'vi-VN' || v.lang.startsWith('vi')
ββ English: v.lang === 'en-US' || v.lang.startsWith('en')
ββ Japanese: v.lang === 'ja-JP' || v.lang.startsWith('ja')
ββ Korean: v.lang === 'ko-KR' || v.lang.startsWith('ko')
ββ Any: Pass language code as config parameterSet utterance.lang to match the content language for correct pronunciation.
π Reference Files β
| File | Content |
|---|---|
| tts-engine.md | Complete SpeechSynthesis API reference, chunking strategies, voice selection |
| audio-player.md | HTMLAudioElement patterns, preload strategies, error handling |
| voice-cro.md | Trigger system, bottom sheet patterns, CRO analytics |
| ui-patterns.md | Player bar CSS, bottom sheet CSS, animations, responsive design |
π Reference Implementations β
| File | Description |
|---|---|
| examples/blog-reader.js | Complete TTS reader β Substack-style, 350 LOC |
| examples/voice-cro.js | Complete Voice CRO trigger system β 390 LOC |
Remember: Voice is the most personal interface. A well-placed audio feature can increase engagement 3-5x. But unwanted audio is the fastest way to lose a user. Always require user initiation. Never autoplay.