New AI Video Generator "Vidu" Competes with SORA

AI Search
28 Apr 202411:37

TLDRChinese company Shu introduces Vidu, an AI video generator rivaling SORA. Vidu, built on the Universal Vision Transformer (UViT) architecture, combines diffusion and Transformer models to produce more coherent and accurate videos. The tool can generate a 16-second 180p clip with one click and is claimed to be on par with SORA, despite some inconsistencies in video generation. Vidu's technology was first proposed in September 2022, predating SORA's. Interested users can apply for access on shangu.ai.com, contributing to the global AI competition and offering an alternative to existing video generators like Runway and Pika.

Takeaways

  • 🌟 A Chinese company, Shu, has announced a new AI video generator named Vidu, positioning it as a competitor to SORA.
  • 🎥 Vidu can generate a 16-second 180p video clip with a single click, showcasing its efficiency in video generation.
  • 🤖 Built on a self-developed architecture called Universal Vision Transformer (UVIT), Vidu integrates diffusion and Transformer AI models.
  • 🚀 The combination of diffusion and Transformer models is seen as an advancement in generative AI, potentially overcoming previous limitations.
  • 📈 The Transformer model, known for its context understanding capabilities, is expected to enhance the coherence and accuracy of generated content.
  • 📚 The core technology of UVIT was initially proposed by Vidu's research team in September 2022, prior to SORA's model architecture.
  • 🆚 In a direct comparison, Vidu's video quality appears to be competitive, though not necessarily on par with SORA, which is yet to be publicly released.
  • 👀 Vidu demonstrates a strong ability to generate realistic hands and other detailed elements in its videos.
  • 🔍 Despite some inconsistencies noticed in the generated videos, Vidu's results are considered notably better than current alternatives like Runway and Pika.
  • 🌐 Interested users can apply to use Vidu through the company's website, shangu.ai, where they can leave their contact information for access.
  • 🌐 China has recently been active in the AI space, with new advancements in AI models and robots, indicating a global competition in AI technology.

Q & A

  • What is the name of the AI video generator announced by the Chinese company Shu?

    -The AI video generator announced by the Chinese company Shu is called 'Vidu'.

  • What is the core technology behind Vidu's video generation?

    -Vidu's core technology is based on a self-developed visual transformation model architecture called Universal Vision Transformer (UViT), which integrates two text video AI models of diffusion and Transformer.

  • What are the potential advantages of merging a Transformer model with a diffusion model?

    -Merging a Transformer model with a diffusion model is considered the next step in generative AI, potentially allowing for more coherent and accurate video or image generation due to the Transformer's ability to understand context.

  • How does Vidu compare to other video generators in terms of generating hands?

    -Vidu generates hands very well, with five fingers on each hand and a realistic appearance, which is a significant improvement over other video generators.

  • What is the output resolution of Vidu's video generation?

    -Vidu can output 1080p resolution videos, although the showreel provided in the transcript is only in 720p.

  • How can one apply to use Vidu's AI video generator?

    -To apply to use Vidu's AI video generator, one can visit shangu.ai.com, scroll down to the video generation section, and fill out the form with their name, phone number, and company name.

  • What is the significance of the Universal Vision Transformer (UViT) in the context of AI video generation?

    -The Universal Vision Transformer (UViT) is significant because it combines the advantages of diffusion and Transformer models, potentially leading to more coherent and realistic video generation.

  • How does Vidu's video generation compare to SORA's in terms of realism?

    -While Vidu's showreel seems to produce high-quality and realistic videos, it is not yet on par with SORA, which has not been released for public use but is expected to set a high standard.

  • What are some of the limitations observed in Vidu's video generation compared to SORA?

    -Some limitations observed in Vidu's video generation compared to SORA include inconsistencies in elements like hair transforming into a red ribbon and a green leaf disappearing, which were not seen in SORA's examples.

  • What are some recent advancements in AI from China that have been mentioned in the script?

    -Recent advancements in AI from China mentioned in the script include the launch of SORA, the release of SenseNova 5.0, and the unveiling of the S1 robot by ASOT.

  • What is the significance of competition in the AI video generation space?

    -Competition in the AI video generation space is significant as it drives innovation, improves technology, and can lead to better products and services for consumers.

Outlines

00:00

🚀 Introduction to SORA Competitor: VD's AI Video Generator

The video script introduces a new AI video generator developed by a Chinese company called Shu, which claims to be a competitor to OpenAI's SORA. The tool, named VD, is capable of generating a 16-second 180p video clip with a single click. It is built on a novel architecture called Universal Vision Transformer (UViT), which combines the strengths of both diffusion and Transformer models. This integration is seen as a significant advancement in generative AI, potentially overcoming the limitations of previous models like stable diffusion. The script also mentions that the core technology of UViT was first proposed by VDU's research team, predating SORA's model architecture. The video includes a showreel to demonstrate the capabilities of VD, suggesting that it could be a strong contender in the AI video generation space.

05:01

📊 Comparative Analysis of VD and SORA's AI Video Generation

This paragraph presents a side-by-side comparison between VD's AI video generator and OpenAI's SORA. The script highlights that while VD's showreel is impressive and outperforms current video generators like Runway and Pika, it may not yet match the quality of SORA, which has yet to be released for public use. The comparison includes specific examples where VD's videos show minor inconsistencies, such as a green leaf disappearing or a wooden toy ship on a carpet being misinterpreted. Despite these flaws, VD's ability to generate realistic hands and faces is noted. The video also addresses the resolution difference between VD's 720p showcase and SORA's full HD videos, suggesting that the lower resolution may affect the perceived quality of VD's output. The paragraph concludes with a call to action for viewers to apply for access to VD through its website, shangu.ai.com.

10:03

🌏 Global AI Developments and the Emergence of VDU

The final paragraph of the script shifts focus to the broader landscape of AI advancements, particularly highlighting recent developments from China. It mentions the launch of a new language model by a Chinese company and the unveiling of a high-speed robot, emphasizing the rapid pace of AI innovation beyond the major tech giants in the United States. The script suggests that the emergence of VDU's AI video generator, along with these other AI products, indicates that China is a significant player in the global AI race. The video concludes by encouraging viewers to share their thoughts on VDU and whether they believe it can compete with or surpass SORA. It also invites viewers to apply for access to VDU and to stay tuned for more content on the topic.

Mindmap

Keywords

💡AI Video Generator

An AI video generator is a technology that uses artificial intelligence to create videos automatically. In the context of the video, 'Vidu' is an AI video generator developed by a Chinese company, Shu, which is claimed to be a competitor to 'SORA'. The script mentions that Vidu can generate a 16-second 180p video clip with one click, showcasing the capability of AI in video production.

💡SORA

SORA is another AI video generator mentioned in the script as a benchmark for comparison. It is used to illustrate the capabilities of Vidu, with the presenter suggesting that Vidu might be on par with SORA in terms of video generation quality. The script plays a show reel to compare the outputs of Vidu and SORA.

💡Universal Vision Transformer (UViT)

UViT refers to a self-developed visual transformation model architecture that Vidu is built upon. It integrates two text-video AI models: diffusion and Transformer. The script explains that merging these models is considered a next step in generative AI, suggesting that UViT could be more advanced than existing models like stable diffusion.

💡Diffusion Model

The diffusion model is a type of generative AI model that has been used to create images and videos. The script mentions that it has limitations, such as not generating text well and not understanding context. It is part of the UViT architecture, where it is combined with the Transformer model to overcome these limitations.

💡Transformer Model

The Transformer model is a type of AI model that excels at understanding context, based on the paper 'Attention is All You Need' by Google DeepMind. It is the backbone of many language models today. In the script, it is highlighted as a key component of UViT, which, when merged with the diffusion model, is expected to produce more coherent and accurate video outputs.

💡Generative AI

Generative AI refers to AI systems that can create new content, such as images or videos, that did not exist before. The script discusses how generative AI has evolved, with Vidu and its UViT architecture being a step forward from previous models like stable diffusion.

💡Runway and Pika

Runway and Pika are mentioned as the two best video generators currently available. They are used as a comparison to Vidu, with the script suggesting that Vidu's capabilities might surpass these existing tools based on the show reel provided.

💡Resolution

Resolution in the context of video refers to the number of pixels used to form the image, affecting the level of detail and quality. The script notes that Vidu's videos are not in full HD resolution, which affects the crispness and sharpness of the details compared to SORA's videos.

💡Consistency

Consistency in video generation refers to the ability of the AI to maintain logical and visual coherence throughout the video. The script points out instances where Vidu's video outputs show inconsistencies, such as a leaf disappearing or a transformation of hair into a red ribbon, which contrasts with the more consistent outputs from SORA.

💡Competition

Competition in the AI space is highlighted as a positive aspect that drives innovation and improvement. The script expresses enthusiasm for Vidu's release as it provides competition to SORA, potentially leading to better AI video generation technologies.

Highlights

Chinese company Shu announces a SORA competitor, an AI video generator called Vidu.

Vidu claims to be on par with OpenAI's Sora in video generation capabilities.

Vidu can generate a 16-second 180p video clip with one click.

Built on a self-developed architecture called Universal Vision Transformer (UViT).

UViT integrates diffusion and Transformer models, a significant step in generative AI.

Diffusion model limitations include poor text generation and lack of context understanding.

Merging Transformer with diffusion model aims to enhance coherence and accuracy.

The Transformer model is based on the influential 'Attention is All You Need' paper by Google DeepMind.

Ju Jun, Vice Dean of Tsinghua University's Institute of AI, comments on the alignment with Sora's technical roadmap.

Vidu's UViT core technology was proposed before Sora's model architecture.

Vidu showcased videos that outcompete current video generators like Runway and Pika.

Comparisons between Vidu and Sora's videos show mixed results in quality and realism.

Vidu's video resolution is 720p, which is lower than Sora's full HD.

Vidu can output 1080p videos, as reported by Global Times.

The application process for using Vidu is available on shangu.ai.com.

China has been releasing innovative AI products, indicating a competitive landscape in AI development.

The presence of Vidu as a competitor is welcomed, as it drives innovation and improvement in AI video generation.