The Verdict
Professional-quality YouTube workflow for solo creators. Produce weekly while maintaining full-time job.
Publishing a YouTube video used to require a scriptwriter, voice artist, graphic designer, and video editor. Today, one person with this stack ships a professional-quality video per week while keeping their day job.
This stack compresses the production timeline from two weeks to five days. ChatGPT and Claude research and script your video. Canva designs thumbnails that stop viewers mid-scroll. ElevenLabs produces human-quality voice-overs without hiring a voice actor. Descript auto-edits your footage (transcribes, detects silences, syncs visuals to pacing). What used to require a team of four is now a solo operation.
The economics change everything. At £60/month in tools, you can invest 10 hours/week on one video, publish weekly, and build an audience that eventually becomes a business (sponsorships, digital products, course sales).
The 5-Day Video Creation Workflow
Day 1 (90 minutes): Research and scripting
- You have a video idea (or you use ChatGPT to brainstorm ideas based on your channel topic).
- Claude researches the topic. You ask: “I want to make a video about why most people fail at learning a language. Research the psychology and neuroscience. Give me talking points.”
- Claude delivers a 2,000-word research doc with citations in 90 seconds.
- ChatGPT converts this into a tight video script (1,200 words, paced for 12 minutes).
- You spend 30 minutes editing the script to match your voice.
- End of day: Finalised script ready for voice recording.
Day 2 (75 minutes): Thumbnail and graphics
- You ask Canva to design 3 thumbnail options for your video.
- Canva has templates for your genre (educational, tech, etc.). You customise them: headline, background image, your photo. 20 minutes per thumbnail.
- You post all 3 to Discord or Slack. Friends vote. You pick the winner.
- While thumbnails are rendering, you queue up background footage (stock video, screen recording, gameplay) using ElevenLabs for voice-over timing.
- End of day: Finalised thumbnail and graphics asset list.
Day 3 (120 minutes): Voice-over recording
- ElevenLabs converts your script into a human-quality voice-over in your chosen voice and accent. (Options: US English, UK English, Australian, etc.)
- Most creators use ElevenLabs’ Studio mode (paste script, click generate, 5-minute turnaround).
- You download the MP3. 10 minutes total, including tweaking speed/tone if needed.
- Descript imports the MP3 and auto-generates a transcript and edit points.
- End of day: Professional voice-over, transcribed and timed.
Day 4 (180 minutes): Video assembly
- Descript auto-aligns your voice-over with your video footage. It detects natural edit points (pauses in speech, scene changes).
- You watch the auto-edited result. It’s 70% there. You manually adjust 10-15 edit points (tighten pacing, remove dead air, emphasise key moments).
- You add background music (Epidemic Sound or Artlist; both integrate with Descript). You add the thumbnail.
- You export. 1080p, optimal settings for YouTube. Descript handles all of this.
- End of day: Final video file ready for upload.
Day 5 (90 minutes): Upload, thumbnail, metadata
- Upload to YouTube. Add finalised thumbnail.
- Write SEO-optimised title and description (Claude helps with this).
- Add chapters from Descript’s auto-generated transcript.
- Schedule for 9am your target timezone.
- Share with your email list and social channels. Done.
Total creator time: 8-9 hours spread across 5 days. Published video.
Traditional timeline (doing it manually): 40-50 hours, two weeks, burnt out afterwards.
Stack Breakdown
Research & Scripting Foundation: Claude
Claude (Pro, £16/month) is your research partner and first-draft scriptwriter. It handles the thinking work that would otherwise eat 4-5 hours per video.
Why Claude (not ChatGPT) for research:
- Claude produces longer, more nuanced research docs (ChatGPT gets verbose and repetitive)
- Claude’s claims are more accurate (fewer hallucinated facts)
- Claude excels at synthesising multiple sources and presenting them clearly
Weekly use:
- “Research why most people quit learning languages after 3 months. Include psychological factors, neurological factors, social factors. Structure it for a 12-minute video.”
- Claude delivers a 2,500-word research doc in 2 minutes with citations you can verify.
- You skim it. Spot-check one or two facts. (They’re accurate.)
- You understand the topic deeply. Now you can write with authority.
For educational creators, this is game-changing. Your video is always backed by research, not guesses.
Budget: £16/month ($20 USD)
Rapid Brainstorming & Scripts: ChatGPT
ChatGPT (Plus, £20/month USD, approximately £16 GBP) is your ideation engine and rapid scriptwriter. It’s faster than Claude for quick iterations.
Why ChatGPT (not Claude) for scripts:
- Faster turnaround (usually 20-30 seconds)
- Better at punchy, conversational writing (your YouTube voice)
- Strong at pacing (it understands video length and viewer attention)
Weekly workflow:
- You have a topic but no script. Paste it into ChatGPT: “Write a 12-minute video script about [topic]. Tone: friendly, conversational, occasional humour. Hook viewers in the first 20 seconds. Structure: problem → explanation → solutions.”
- ChatGPT delivers a 1,200-word script in 60 seconds.
- You spend 20 minutes editing it to match your voice (every creator has a unique cadence; you’re just fixing ChatGPT’s to match yours).
- Script is done.
Or, if you’re stuck for ideas: “I make videos about [your niche]. Generate 10 video ideas that could get 100k+ views. Rank them by virality potential.”
ChatGPT returns 10 solid ideas with hooks for each. You pick one and start scripting.
Budget: £20/month USD (approximately £16 GBP after conversion)
Thumbnail Design: Canva
Canva Pro (£13/month) is your graphic design studio. YouTube thumbnails are how videos get clicked. Good thumbnails can 2-3x your CTR.
Why Canva (not Photoshop):
- YouTube-optimised templates (1280×720px pre-sized)
- Stock images included (millions of options)
- No design experience required (templates do the heavy lifting)
Weekly thumbnail workflow:
- You’ve written your script. You know the hook (the main idea that gets people to click).
- Open Canva. Search “YouTube thumbnail”. Browse 500+ templates.
- Pick one that matches your style. Customise: headline (usually 3-5 words), background image, add your face/logo.
- Generate 3 variations. Different colours, different headlines, different layouts. 20 minutes total.
- Pick the best one (or ask your Discord community to vote). Done.
Canva Pro includes: millions of stock images (no additional cost), 1TB storage, brand kit (save your colours and fonts for consistency across videos), and Magic Resize (automatically resize your thumbnail for different platforms).
Budget: £13/month
Voice-Over Production: ElevenLabs
ElevenLabs (Creator plan, £11/month) produces synthetic voice-overs that sound human. Most viewers can’t tell it’s AI-generated.
Why ElevenLabs (not generic TTS):
- Voice sounds natural, not robotic (the 2024+ voices are genuinely convincing)
- Supports emotional tone (sad, excited, thoughtful)
- Multiple accents and languages (UK English, US English, Australian, Spanish, French, etc.)
- Studio mode lets you adjust pacing and tone
Weekly workflow:
- Paste your finalised script into ElevenLabs Studio.
- Select your preferred voice (browse 10+ options, pick one that matches your channel vibe).
- Click generate. Wait 2-3 minutes for a 12-minute voice-over.
- Listen. Adjust speed if it feels rushed (1.0x to 1.2x speed range). Re-generate if needed.
- Download MP3. Done.
The Creator plan (£11/month) includes 100,000 characters/month, enough for 2-3 videos per week (assuming 10-12 minute videos).
Why use AI voice-over instead of recording yourself:
- Your voice gets fatigued. AI doesn’t.
- You can publish 2 videos per week without recording sessions exhausting you.
- Consistency (every video has the same voice, building brand recognition).
- Accessibility (hearing-impaired viewers can read the auto-transcript).
Most solo creators record their own voice because it feels authentic. But AI voices are now good enough that authenticity isn’t the constraint anymore. Speed and consistency are.
Budget: £11/month (Creator plan)
Video Editing & Auto-Production: Descript
Descript (Creator plan, £24/month) is where the magic happens. It auto-edits your video based on the voice-over.
What Descript does:
- Auto-transcription: Upload your voice-over. Descript transcribes it in 2-3 minutes.
- Auto-edit points: It detects natural pauses and scene changes and suggests edit points.
- Automatic silence removal: Removes dead air, filler words (“um”, “uh”), and awkward pauses.
- Visual synchronisation: Automatically clips your background footage to match the voice-over pacing.
- Caption generation: Auto-generates captions (huge for YouTube SEO and accessibility).
Typical workflow:
- Import your voice-over MP3 and video footage into Descript.
- Descript auto-transcribes and auto-generates edit recommendations. Takes 5 minutes.
- Watch the preview. It’s usually 60-70% perfect out of the box.
- Manually adjust 10-20 edit points (tighten transitions, emphasise key moments, remove filler).
- Add background music (Descript has music integrated; you can license tracks).
- Add text overlays and effects (optional, but adds production value).
- Export at 1080p. Done.
For a solo creator, Descript is the difference between shipping a video weekly or struggling to ship one per month.
Budget: £24/month (Creator plan includes 1,800 media minutes/month)
Total Monthly Cost
| Tool | Plan | Cost GBP | Cost USD | Notes |
|---|---|---|---|---|
| Claude | Pro | £16 | $20 | Shared across content |
| ChatGPT | Plus | £16 | $20 | Shared across content |
| Canva | Pro | £13 | $16.49 | For thumbnails and graphics |
| ElevenLabs | Creator | £11 | $13.20 | Enough for 2-3 videos/week |
| Descript | Creator | £24 | $30 | Includes monthly media minutes |
| Total | £80 | $99.69 |
Cost per video (assuming 4 videos/month): £20 per video in tool costs. Negligible at scale.
Cost per 1,000 views (typical creator): £0.50-2.00 depending on your niche and audience size. Most creators earn back tool costs within their first 10-20 videos.
How They Connect: The Real Creation Workflow
Monday morning (15 minutes):
- You check your YouTube analytics. Last week’s video did well (50k views). Comments suggest topics people care about.
- You drop a few topic ideas into ChatGPT. “Based on this comment from a viewer, generate 5 video ideas.”
- ChatGPT generates ideas. You pick one.
Monday afternoon (90 minutes):
- You ask Claude: “Research [topic]. Give me 5 key talking points with sources.”
- Claude delivers research.
- You paste the research and topic into ChatGPT: “Write a 12-minute script.”
- ChatGPT delivers script. You edit for 20 minutes.
Tuesday morning (90 minutes):
- You record B-roll (background footage, screen recordings, etc.). You don’t need to be on camera; your voice-over carries the video.
- Or, you use stock footage (Unsplash, Pexels, Pixabay all have free options; Canva has paid stock integrated).
- You design 3 thumbnail options in Canva. Community votes on the best one.
Tuesday afternoon (60 minutes):
- Paste your script into ElevenLabs. Select your voice. Generate voice-over. Download MP3.
Wednesday morning (180 minutes):
- Import voice-over and footage into Descript.
- Descript auto-transcribes and auto-edits.
- You manually adjust 15 edit points and add music.
- Export.
Wednesday afternoon (60 minutes):
- Upload to YouTube.
- Add thumbnail, title, description, and chapters.
- Schedule for Thursday 9am.
Thursday:
- Video goes live. You promote on social channels. Answer comments.
Time investment: 8 hours across 4 days.
Real Scaling Path: One Video Per Week
Month 1–2:
- Publish 1 video every 2 weeks (you’re learning the tools).
- Total time: 12-15 hours per video (inefficient at first).
- Total views: 500-2,000 per video.
- Revenue: £0 (building audience).
Month 3–6:
- Publish 1 video per week (you’re faster now).
- Total time: 8-10 hours per video.
- Total views: 5,000-20,000 per video (audience growing).
- Revenue: Sponsorships or YouTube Partner Program (requires 1,000 subscribers + 4,000 watch hours).
Month 7–12:
- Publish 1-2 videos per week (with the workflow down).
- Total time: 5-8 hours per video.
- Total views: 20,000-100,000+ per video (viral hits occasionally).
- Revenue: £300-1,000/month from YouTube Partner Program + sponsorships.
Month 13+:
- Publish 2 videos per week consistently.
- Audience: 50,000-500,000 subscribers.
- Revenue: £1,000-10,000/month (Partner Program + sponsorships + digital products).
This is not fantasy. Hundreds of creators using this stack report these numbers.
Why This Stack Works for Beginners
No expensive equipment: You don’t need a £5k camera setup. Your iPhone records fine. Descript works with any video file.
No hiring: You don’t need a voice artist, editor, or designer. This stack is all of them.
Forgiving of imperfection: Your voice-over can be rushed or monotone. ElevenLabs smooths it. Your video footage can be poorly lit or jerky. Descript can reframe and add effects.
Scales from part-time to full-time: You can produce 1 video/week while keeping your job. That’s 4 videos/month, 48 per year. Most successful YouTube channels have 50-100 videos in their first year.
Low financial risk: £80/month is your entire tool cost. You could go from zero to 100k subscribers while spending less than £2,000 on tools.
Real Content Examples This Stack Is Built For
Educational channels: “Why you’re bad at math”, “How relativity works”, “Business fundamentals”
Self-improvement: “How to build habits”, “Why procrastination is hard to beat”, “How to learn faster”
Technical tutorials: “Python for beginners”, “SQL optimisation”, “Web development fundamentals”
Analysis channels: “Why this company failed”, “Breaking down investor pitches”, “Industry trends analysis”
Documentary-style: “The history of [technology/company/concept]”, “How [product] was made”
This stack is NOT ideal for:
- Vlogging (daily life, travel vlogging) — requires a different workflow
- Gaming channels — Descript works but ElevenLabs voice-over doesn’t fit the vibe
- Channels requiring extensive custom cinematography — you’re still shooting everything yourself
Workflow Optimisations as You Scale
3 videos/week:
- Batch script writing (write 3 scripts on Monday, record 3 voice-overs Tuesday)
- Batch thumbnail design (design 9 thumbnails on Friday for next week)
- Descript assembly becomes mechanical (same steps, just repeated)
Hiring editor (when revenue justifies it):
- You write script and record voice-over
- Editor assembles video in Descript (you’ve saved them 50% the work)
- You review and approve
- Cost: £200-400/month for part-time editor
At this point, you’re £280/month in tools + £300/month in editor = £580/month. If you’re earning £3k/month from the channel, it’s worth it.
Realistic Constraints
“AI voice-overs sound robotic.” In 2022, yes. In 2026, no. ElevenLabs’ voices are genuinely convincing. The constraint now is pacing and tone, not roboticism.
“Won’t YouTube’s algorithm penalise AI-generated voice?” No evidence of this. YouTube cares about watch time and engagement, not whether the voice is human. Plenty of creators with 500k+ subscribers use ElevenLabs.
“Descript’s auto-editing sometimes misses the mark.” True. It gets 60-70% right. You spend 15 minutes manually adjusting. This is still 10x faster than editing from scratch.
“I can’t make a good video if I can’t write.” This is the real constraint, not the tools. If you can’t write coherently, no tool fixes that. But if you can articulate an idea, Claude and ChatGPT help you shape it into a script.
“Canva thumbnails look generic.” They can, if you don’t customise them. Spend 5 extra minutes tweaking colours and text. Your thumbnail will outperform generic alternatives.
FAQ
Q: Should I use AI voice or record my own voice? A: If you’re publishing 1+ videos per week, use AI voice-over (ElevenLabs). If you’re publishing 1 per month, record your own (feels more authentic, builds parasocial connection). Most creators use AI voice-over; authenticity isn’t the constraint anymore.
Q: Can I make money on YouTube immediately? A: No. You need 1,000 subscribers and 4,000 watch hours in the past 12 months to join the Partner Program (earn from ads). Takes most creators 3-6 months. Alternative: sponsorships (brands pay you directly) starting at month 2-3. Alternative: digital products and courses (sell to your audience directly).
Q: What if my video underperforms? A: Welcome to YouTube. A 50k-view video is a win. A 5k-view video is learning (the hook didn’t work). You analyse why, tweak your next video, and try again. This happens in week 2, not month 6, because your production cost is so low (time, not money).
Q: Should I batch-produce videos (e.g., record 4 in one session)? A: Only if it’s voice-over (batch script, batch record). Video assembly should be spread across the week (time to think between editing sessions). Record all 4 voice-overs in one 4-hour session. Assemble them across the week.
Q: What about video length? Should I target 10 min, 20 min? A: YouTube’s algorithm favours 10-15 minute videos for educational content (higher completion rate). Adjust your script length accordingly. This stack works for 8-20 minute videos equally well.
Q: How do I pick my first 5 video topics? A: Pick topics you can explain well (knowledge edge) that people search for (demand). Use Google Trends or YouTube search auto-complete. “How to [popular problem]” and “[Concept] explained” are reliable formats. Your first video won’t go viral. Make it for yourself, not for YouTube. By video 5, you’ll know your channel’s vibe.
Q: Can I make money from a niche channel (5,000 subscribers)? A: Absolutely. A 5k-subscriber channel in a high-value niche (finance, B2B, legal) can earn £500-2k/month from sponsorships. Audience quality matters more than size for sponsorship revenue.
Get Started
Try ChatGPT Plus Try Claude Pro Try Canva Pro Free Try ElevenLabs Free Start Descript FreeNext steps: Track your video performance, iterate on thumbnails and hooks, and publish consistently. By month 6, you’ll have data on what your audience likes. Use that data to double down.