According to Xinhua News Agency, Kuaishou’s “KLING” video generation model has officially launched on its website recently. Unlike video models from various companies that primarily showcased videos, the newly unveiled KLING model is now open for beta testing on Kuaishou’s Movie App.
Photo/official website of KLING
As per the official website, Kuaishou has a strong track record in short video technology, and its video generation model naturally has a wide range of application scenarios. The KLING model, developed by Kuaishou’s AI team, boasts numerous advantages: it can generate large-scale, reasonable movements; simulate physical world characteristics; has strong conceptual combination capabilities and imagination; produces videos with resolutions up to 1080p, durations up to 2 minutes (at 30fps frame rate), and supports various aspect ratios.
The “KLING” model is a self-developed project by Kuaishou’s AI team, based on Kuaishou’s years of experience in video technology. It follows a technical route similar to “Sora,” combined with multiple proprietary technological innovations, aiming to match the effects of “Sora.”
It is reported that KLING uses the same Diffusion Transformer architecture as Sora, including a 3D spatio-temporal joint attention mechanism. This architecture cleverly integrates time and space information to comprehensively analyze and process video data.
It can accurately capture local spatial features within video frames and cross-frame temporal dynamics, thus more comprehensively understanding and reproducing the motion information in videos.
Therefore, whether it’s fast-moving objects, dramatically changing scenes, or complex human actions, they can all be precisely captured, making the generated video content highly dynamic and realistic.
Let’s take a look at the official effects. Note that the animated GIFs are compressed and of lower quality than the display videos; for the best effect, refer to the official website.
Prompt: Two flowers slowly blooming against a black background, revealing delicate petals and stamens.
Photo/official website of KLING
Prompt: A little white rabbit wearing glasses sits in a café chair reading a newspaper, with a hot cup of coffee on the table.
Photo/official website of KLING
Prompt: A hand pours milk from a stainless steel milk jug into a cup of coffee on the table, with a blurred kitchen in the background.
Photo/official website of KLING
With a profound understanding of text-to-video semantics and the powerful capabilities of the Diffusion Transformer architecture, KLING can transform users’ rich imaginations into specific visuals, creating scenes that wouldn’t exist in the real world.
Based on proprietary 3D facial and body reconstruction technology, combined with background stability and redirection modules, KLING achieves full expression and body movement technology. With just one full-body photo, users can experience a vivid “sing and dance” gameplay.
Photo/official website of KLING
Public records show that Kuaishou has previously released a general large language model “Kuaiyi,” a text-to-image large model product “Ketú,” and introduced key video technologies such as Direct-a-Video, Video-LaVIT, I2V-Adapter, and UNIAA, which have attracted widespread attention. It is reported that with the release of the KING large model, Kuaishou will continue to accelerate the development and application of large models, bringing more diverse AI creation and interactive experiences.