Reel Transcriber Bot
Instagram reel transcription via Telegram bot
Problem
I consume a lot of short-form video content for research and learning. Watching every reel fully is slow, and rewatching to catch a specific detail is slower. I needed a way to turn any reel into text I could read, save, and reference later.
Reading a transcript takes seconds. Watching a 90-second reel twice takes three minutes.
Stack
Make.com orchestrates the entire pipeline. A RapidAPI service downloads the reel from Instagram. Gemini API processes the audio and generates a structured transcript with analysis. The bot only processes audio, not video frames. Input and output both happen through Telegram Bot API.
Flow diagram
7 steps — triggered instantly
Webhook → HTTP (RapidAPI download) → HTTP (process media) → HTTP (download file, conditional filter) → Resume (error handling) → HTTP (Gemini API transcription) → HTTP (send to Telegram)

~30 seconds per reel. No scheduling.
The output has four sections: Transcript, Summary, Tone and audience, and Takeaways.

Prompt iterations
First Gemini prompt: raw unformatted text. Round 1 defined explicit output sections (transcript, summary, tone, audience, takeaways). Round 2 tuned handling of overlapping audio, background music, and unclear speech to flag gaps instead of guessing.
Hardest part: getting consistent output format across different reel styles (talking head, voiceover, interview).
Failures fixed
RapidAPI failures
Instagram URLs don't always resolve. Added validation checks and fallback error messages to Telegram.
Timeout on longer reels
Reels over 60 seconds exceeded Make.com execution window. Adjusted timeout settings and added Resume module for recovery.
Result
Used multiple times a day. Send a link, get structured transcript in ~30 seconds. Replaced watching, rewatching, and manual note-taking. Running in production on v2 with no interruptions.