Skip to content

πŸ“‹ Full Skill Source β€” This is the complete, unedited SKILL.md file. Nothing is hidden or summarized.

← Back to Skills Library

CM ReadIt β€” Web Audio Experience Skill ​

Philosophy: Reading is passive. Listening is intimate. Voice builds trust faster than any headline. Core Principle: Zero dependencies. Progressive enhancement. Respect user's device and preferences.


🎯 Selective Reading Rule (MANDATORY) ​

FileStatusWhen to Read
tts-engine.mdπŸ”΄ REQUIREDAdding TTS / read-aloud to any page
audio-player.mdβšͺ OptionalPre-recorded MP3 playback
voice-cro.mdβšͺ OptionalTrigger-based voice sales / CRO
ui-patterns.mdβšͺ OptionalPlayer bar & bottom sheet design

πŸ”΄ tts-engine.md = ALWAYS READ when implementing TTS. Others = only if relevant.


Quick Decision Tree ​

"I need audio on my website"
β”‚
β”œβ”€ Read article content aloud (text-to-speech)
β”‚  └─ Use: TTS Engine β†’ tts-engine.md
β”‚     β”œβ”€ Blog / article pages β†’ Content Reader pattern
β”‚     β”œβ”€ Documentation β†’ Section Reader pattern
β”‚     └─ E-commerce β†’ Product Description Reader pattern
β”‚
β”œβ”€ Play pre-recorded audio files (MP3/WAV)
β”‚  └─ Use: Audio Player β†’ audio-player.md
β”‚     β”œβ”€ Podcasts / interviews β†’ Playlist pattern
β”‚     β”œβ”€ Sales pitch / welcome β†’ Triggered playback
β”‚     └─ Background ambient β†’ Loop pattern
β”‚
β”œβ”€ Voice-based conversion optimization (CRO)
β”‚  └─ Use: Voice CRO β†’ voice-cro.md
β”‚     β”œβ”€ Landing pages β†’ Trigger-based bottom sheet
β”‚     β”œβ”€ Service pages β†’ Per-page audio scripts
β”‚     └─ Course pages β†’ Social proof audio
β”‚
└─ Combination (TTS + CRO)
   └─ Read tts-engine.md + voice-cro.md
      └─ Ensure no conflict (TTS reader vs CRO player)

🧠 Core Principles (Internalize These) ​

1. The 3 Audio Engines ​

EngineAPISourceBest For
TTS ReaderSpeechSynthesisPage text contentBlogs, articles, docs
Audio PlayerHTMLAudioElementPre-recorded MP3Sales, podcasts, guides
Voice CROAudio + triggersMP3 + behavior detectionLanding pages, sales

2. Progressive Enhancement ​

Feature detection β†’ Graceful degradation β†’ Never break the page

if (!('speechSynthesis' in window)) return;  // TTS
if (!window.Audio) return;                    // Audio

Rule: Audio features are ENHANCEMENTS. The page must function 100% without them.

3. Content Extraction Principle ​

Clone β†’ Strip β†’ Clean β†’ Split β†’ Speak

DON'T read the raw DOM.
DO clone, remove noise, extract clean text.

Strip list (always remove before speaking):

  • CTAs, promotions, ads
  • Navigation, footer, sidebar
  • Images, videos, iframes, SVGs
  • Scripts, styles, hidden elements
  • Tags, badges, metadata

4. The Chunking Problem ​

Browsers have a hard limit on utterance length (~3000-5000 chars depending on browser/OS). Long text must be split into chunks.

Split Strategy:
β”œβ”€ Split on sentence boundaries (. ! ? \n)
β”œβ”€ Max chunk: 2500 chars (safe across all browsers)
β”œβ”€ Preserve sentence integrity (never split mid-sentence)
└─ Chain chunks via onend callback

5. Voice Selection Priority ​

Language voices:
1. Local service voice (faster, works offline)
2. Network voice (higher quality, needs internet)
3. Any voice matching language prefix
4. null (browser default)

6. Chrome Keep-Alive Bug ​

⚠️ CRITICAL: Chrome silently stops SpeechSynthesis after ~15 seconds of continuous speech. This is the #1 gotcha.

javascript
// Workaround: pause/resume every 10s
setInterval(() => {
    if (synth.speaking && !synth.paused) {
        synth.pause();
        synth.resume();
    }
}, 10000);

7. synth.cancel() Triggers onerror ​

⚠️ GOTCHA: Calling synth.cancel() fires the onerror event on any active utterance with error type 'canceled' or 'interrupted'.

Solution: Use a guard flag or check error type:

javascript
u.onerror = function(e) {
    if (e.error === 'canceled' || e.error === 'interrupted') return;
    stopReading();
};

πŸ—οΈ Architecture Pattern ​

Minimal TTS Reader (Copy-Paste Starting Point) ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  IIFE                    β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€ Feature Detection ─┐                β”‚
β”‚  β”‚  speechSynthesis?    β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚             β–Ό                            β”‚
β”‚  β”Œβ”€ Content Extraction ─┐               β”‚
β”‚  β”‚  Clone β†’ Strip β†’ Cleanβ”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚             β–Ό                            β”‚
β”‚  β”Œβ”€ Chunking Engine ────┐               β”‚
β”‚  β”‚  Split on sentences   β”‚              β”‚
β”‚  β”‚  Max 2500 chars       β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚             β–Ό                            β”‚
β”‚  β”Œβ”€ Utterance Builder ──┐               β”‚
β”‚  β”‚  Set voice/rate/pitch β”‚              β”‚
β”‚  β”‚  Chain via onend      β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚             β–Ό                            β”‚
β”‚  β”Œβ”€ Player UI ──────────┐               β”‚
β”‚  β”‚  Bar: play/pause/stop β”‚              β”‚
β”‚  β”‚  Progress indicator   β”‚              β”‚
β”‚  β”‚  Trigger button       β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚             β–Ό                            β”‚
β”‚  β”Œβ”€ Keep-Alive Timer ───┐               β”‚
β”‚  β”‚  pause/resume @ 10s  β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Lifecycle ​

Init β†’ Detect β†’ Inject Trigger Button
         β”‚
   User clicks β–Ά
         β”‚
   Extract Text β†’ Chunk β†’ Build Utterances
         β”‚
   synth.speak(chunk[0])
         β”‚
   chunk[0].onend β†’ speak(chunk[1]) β†’ ... β†’ speak(chunk[N])
         β”‚                                        β”‚
   Keep-Alive Timer running                   chunk[N].onend
         β”‚                                        β”‚
   User clicks ⏸ β†’ synth.pause()             stopReading()
   User clicks β–Ά β†’ synth.resume()            cleanup UI
   User clicks βœ• β†’ synth.cancel()

πŸ“ Implementation Checklist ​

For TTS Reader ​

  • [ ] Feature detection (speechSynthesis in window)
  • [ ] Content container identified (ID or selector)
  • [ ] Strip list defined (what to remove before reading)
  • [ ] Chunk size set (default 2500)
  • [ ] Voice selection logic (language-specific)
  • [ ] Player bar UI (play/pause/close + progress)
  • [ ] Trigger button injected (topbar or floating)
  • [ ] Chrome keep-alive timer (10s interval)
  • [ ] onerror guard (handle cancel/interrupted)
  • [ ] beforeunload cleanup
  • [ ] prefers-reduced-motion respect
  • [ ] Mobile safe-area padding

For Audio Player ​

  • [ ] Audio files hosted and accessible
  • [ ] Preload strategy (none β†’ load on demand)
  • [ ] Play/pause toggle with state management
  • [ ] Progress bar with currentTime/duration
  • [ ] Error handling (network, format, autoplay policy)
  • [ ] Session state (dismissed = don't show again)

For Voice CRO ​

  • [ ] Per-page config object (delay, scroll threshold, audio URLs)
  • [ ] Trigger conditions (time + scroll AND/OR interaction)
  • [ ] Bottom sheet UI (icon, text, CTA, dismiss)
  • [ ] Player bar UI (toggle, progress, CTA button)
  • [ ] Session dismissal tracking
  • [ ] Stats tracking (shown/listened/dismissed)
  • [ ] No conflict with TTS Reader

⚠️ Common Pitfalls ​

PitfallSymptomFix
Chrome stops after 15sAudio cuts mid-sentenceKeep-alive timer (pause/resume)
synth.cancel() fires onerrorSettings sheet closes immediatelyGuard flag or check error type
Voices not loadedNo voice availableListen for voiceschanged event
Chunk too largeUtterance fails silentlyMax 2500 chars per chunk
Reading CTA textTTS reads "Đặt Lα»‹ch Ngay"Strip non-content elements
Autoplay blockedAudio won't start on mobileRequire user interaction first
Multiple audio conflictsTTS + CRO play simultaneouslyMutual exclusion check
No cleanup on navAudio keeps playingbeforeunload β†’ synth.cancel()

🌐 Multi-Language Support ​

Voice selection by language:
β”œβ”€ Vietnamese: v.lang === 'vi-VN' || v.lang.startsWith('vi')
β”œβ”€ English: v.lang === 'en-US' || v.lang.startsWith('en')
β”œβ”€ Japanese: v.lang === 'ja-JP' || v.lang.startsWith('ja')
β”œβ”€ Korean: v.lang === 'ko-KR' || v.lang.startsWith('ko')
└─ Any: Pass language code as config parameter

Set utterance.lang to match the content language for correct pronunciation.


πŸ“š Reference Files ​

FileContent
tts-engine.mdComplete SpeechSynthesis API reference, chunking strategies, voice selection
audio-player.mdHTMLAudioElement patterns, preload strategies, error handling
voice-cro.mdTrigger system, bottom sheet patterns, CRO analytics
ui-patterns.mdPlayer bar CSS, bottom sheet CSS, animations, responsive design

πŸ”— Reference Implementations ​

FileDescription
examples/blog-reader.jsComplete TTS reader β€” Substack-style, 350 LOC
examples/voice-cro.jsComplete Voice CRO trigger system β€” 390 LOC

Remember: Voice is the most personal interface. A well-placed audio feature can increase engagement 3-5x. But unwanted audio is the fastest way to lose a user. Always require user initiation. Never autoplay.

Open Source AI Agent Skills Framework