Latest news with #KonstantinHohr

AI-Driven Video Production: Architecting The Next Generation Of Tools

Forbes

27-03-2025

Forbes

AI-Driven Video Production: Architecting The Next Generation Of Tools

Konstantin Hohr is the Founder and CEO of Revideo, former Software Engineer at Jodel and DynamoAI. The video content creation landscape is changing. While traditional timeline-based editors have served the industry well, they're increasingly insufficient for modern content demands, particularly when dealing with programmatically generated assets and complex animations. This article examines the technical infrastructure enabling the next generation of video creation tools, drawing from my experience building Revideo's programmatic video framework. Video editing software traditionally works with timelines—strips of video, images and audio that can be arranged and modified. While this works for linear editing, it becomes limiting when trying to create dynamic content programmatically. My research has led me to a different approach: Instead of a timeline, code provides a more powerful way to describe how elements move and change over time. Making use of the HTML canvas as a universal render target, animations can be described through a TypeScript-based framework. This architecture offers several advantages. The canvas element provides a standardized, high-performance rendering API that's well-supported across platforms. By expressing animations in TypeScript, you gain type safety and better tooling support, while enabling developers to create reusable components and complex animation patterns that would be unwieldy in traditional timeline-based editors. Getting large language models (LLMs) to reliably generate code in a framework that isn't widely included in training data is surprisingly challenging. I have developed several strategies to make sure the output conforms to the desired syntax. First, context-enriched prompting is a must. Providing parts of the framework's documentation as a guide to outline the available API, as well as warnings for common pitfalls I've seen the model fall into, improves the performance drastically. Another way to improve the results is an error feedback loop, where the generated Typescript code is transpiled and syntax errors, should they exist, are fed back into the model. This allows the model to correct oversights without requiring the user to explicitly prompt for them. The trace from the Typescript transpiler is usually enough to guide the LLM to a solution. A straightforward approach to improving code accuracy is fine-tuning a model on exemplary code, incorporating past conversations into the training data. This helps reduce common hallucinations and mistakes. Like all LLM-driven products, it's crucial to collect output data early. Once sufficient training data accumulates, it becomes a valuable source for improvement. Most interestingly though, since available functions and parameters are already known ahead of code-generation-time, there is also a case here for structured output, where we build a context-free grammar (CFG), to statically define what code for a valid video might look like. OpenAI unfortunately only allows for JSON to be generated this way. I'm currently exploring ways to either map from a given JSON schema back into Typescript. Utilizing open-source models might be an easier path though, since these allow for modification of how tokens are sampled, making more complex rules easy to implement. Either way, this would make it practically impossible for the model to generate syntactically incorrect code. Optimizing rendering speeds was an entirely different challenge. Let's take video-in-video as an example. To show a user-provided video file on the canvas, the standard HTML video element is a good starting point for a first implementation. To flip through the video frame by frame during rendering, we could set its current time to the rendered frame plus the offset of the video start. This approach works, but prompts the browser to re-seek the video from the last keyframe to the provided time stamp, which means redoing a lot of work on each rendered frame. This becomes incredibly slow as a lot of work is done over and over again. Clearly, a different approach is needed. I solved this issue through a custom frame extraction system built on the WebCodecs API. Instead of relying on the browser's implementation, you can process frames sequentially and save your work, significantly reducing computational overhead. Each extracted frame is painted as an image to the canvas. This optimization has yielded performance improvements of up to 100 times in extreme cases, enabling real-time preview capabilities even for complex compositions. The rendering pipeline operates in two modes: Through my work in this space, I've witnessed how video production tools are being fundamentally reshaped by the intersection of programmatic approaches and artificial intelligence. Through exploring context-enriched prompting, error feedback loops and structured output approaches, I've observed significant progress in reliable AI code generation, while the optimized rendering pipeline using WebCodecs API has solved critical performance challenges. Programmatic video creation is at a turning point. The rise of sophisticated AI models presents a key challenge: How can we preserve the power of code-based approaches while making them accessible to creators across all skill levels? The technical solutions discussed here lay the groundwork for innovation, suggesting a future where video creation can become not only more capable but also more approachable for everyone. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Latest news with #KonstantinHohr

AI-Driven Video Production: Architecting The Next Generation Of Tools

Get Started Now: Download the App