Tencent’s new AI video-generation tool brings static images to life in collaboration with Tsinghua and HKUST

Follow-Your-Click takes images combined with simple text prompts and turns them into short video clips with just a click
Tencent collaborated with researchers from universities in Hong Kong and Beijing amid growing excitement around AI video generation

Artificial intelligence

Iris Dengin Shenzhen

Published: 9:10pm, 15 Mar 2024

Why you can trust SCMP

Chinese internet giant Tencent Holdings has introduced an image-to-video artificial intelligence (AI) model in collaboration with academic partners on Friday, a release that comes amid the rising fervour around content-generating tools like OpenAI’s ChatGPT and Sora.

The image-animation tool called Follow-Your-Click, released on Microsoft’s open-source code website GitHub, allows users to click on certain parts of a picture with a simple text prompt indicating how they would like it to move to then transform a still image into a short animated video.

The project is a collaboration between Tencent’s Hunyuan team, the Hong Kong University of Science and Technology and Tsinghua University, one of mainland China’s top two universities in Beijing.

OpenAI’s Sora pours ‘cold water’ on China’s AI dreams

Tencent said it will release the full code for the model in April, but a demo is already available on GitHub. Researchers showcased some of its capabilities there, with one result showing how an image of a bird with the prompt “flap the wings” turned into a short MP4 file of a rainbow-coloured avian twitching one of its wings.

Another image of a girl standing outdoors with the simple one-word prompt “storm” turned into an animation with lightning flashing in the background.

Follow-Your-Click aims to solve issues faced by other image-to-video models on the market that tend to move entire scenes rather than focusing on specific objects in a picture, according to an academic paper by the researchers from the three organisations. Other models require users to give elaborate descriptions of how and where they want the image to move.

“Our framework has simpler yet precise user control and better generation performance than previous methods,” the researchers said in the paper published on Wednesday on arXiv, an online scientific paper repository.

Video generation has become a hot topic since Microsoft-backed OpenAI released its text-to-video Sora model, the impressive results of which led to a new round of soul searching within China’s AI industry as players seek to catch up in generative AI.

In the field of text- and image-to-video generation, Silicon Valley-based Pika Labs, co-founded by Chinese PhD candidate Guo Wenjing at Stanford University, is another rising star. The start-up has raised US$55 million in seed capital and Series A funding rounds from some of the biggest names in tech.

Tencent’s Chinese peers have also joined the race. Alibaba Group Holding, owner of the South China Morning Post, recently launched a portrait video-generation tool called EMO that turns images and audio prompts into videos that sing and talk.

Follow-Your-Click joins Tencent’s open-source text-to-video-generation and editing toolbox called VideoCrafter2, which the tech giant released in January. It is an updated version of VideoCrafter1, released in October 2023, but is limited to videos of just two seconds long.