My Video Production Stack (for YouTube)
There are two barriers to publishing videos on YouTube. The first one is getting over thinking you are cringe and the second one is video editing. You have to battle the first one on your own, but I’m hoping that with what I share below, you can ease the burden of the latter.
No-hands editing, processing and publishing
The goal of this video production stack is to take a raw video, then automatically edit it, transcribe it, come up with timestamped chapter markers for the YouTube description, and upload it to my YouTube channel. Basically this is a fully automatic publishing workflow after the point I hit stop on the record button.
This, of course, is very specific to the kinds of videos that I make, which is talking head videos with a screen share. But that is a large enough genre on YouTube that this setup might be useful to others out there.
I have several different tools across multiple git repos to accomplish this. Today I am making all those repos public. Let me explain below what each one does.
The Complete Workflow
My YouTube publishing workflow consists of four main components, each handling a different part of the video processing pipeline. They’re written in Python and designed to work seamlessly together:
- color_edit - Intelligent video editing based on visual cues
- yt_chapter_maker - AI-powered chapter generation and title suggestions
- yt_upload - YouTube API integration for uploading
- video_upload_workflow - The orchestrator that ties everything together
Let me walk you through each component and how they work internally.
color_edit: Visual Cue-Based Video Editing
The first and most crucial tool in my stack is color_edit
. This is the tool that automatically creates a fully edited video from my raw recording. Before I created this tool for myself, I used to spend multiple hours reading each video.
The key innovation here is using colored frames as editing markers. When recording, I display:
- Green frames to mark sections I want to keep
- Red frames to mark sections I want to remove
- Regular content is neither red nor green
This is much easier to demonstrate than to explain. Check out this video where I visually explain how this works.
Another important function performed by this tool is the removal of silent parts of the video, also known as “dead air”.
The final output is a tightly edited video with no dead air and only the sections I explicitly marked as keepers.
yt_chapter_maker: AI-Powered Chapter Generation
Once the video is edited, yt_chapter_maker
takes over to generate metadata. This tool is pretty simple. It prompts an LLM to look at the transcript for the video and then suggest few options for titles as well as timestamped chapter markers that can go in the video description.
yt_upload: YouTube API Integration
The yt_upload
tool handles the actual YouTube upload process using the YouTube API. You will need to get client secrets using the Google Cloud console in order to invoke the API. Also, the very first time you run it, you will have to do an OAuth dance through the browser.
The uploaded video is kept as private so that you look at the video and its metadata before publishing it.
video_upload_workflow: The Orchestrator
Of course I don’t want to have to invoke all of the above steps individually every time. So I have a meta-orchestrator that runs the entire pipeline. It also takes care of transcribing the video using OpenAI’s whisper tool.
There’s some caching of intermediate outputs to make retries faster in case something breaks midway.
Audio
Getting good audio is way more important than high-resolution video. People can watch a potato webcam video if it has decent audio, but will not stay for 2 seconds on video shot on an 8K RED camera if the audio sounds like it is coming out of a tincan.
Fortunately, it is very hard to go wrong with any decent modern microphone. These are the mics I’m currently using (depending on where I am recording).
Experience
Various parts of the above setup came about organically over time.
color_edit
is the oldest one. I’ve been using it for more than 4 years now and every video on my channel in that time frame has been automatically edited using it. I haven’t even been making any changes to it recently. It does its job well and without any fuss.
The other parts are more recent and were mostly cobbled together in the last year or so. All the same, at this point they have been used to publish many videos on my channel and I intend to keep using this pipeline.