New AI Video Generator "Vidu" Competes with SORA
TLDRChinese company Shu introduces Vidu, an AI video generator rivaling SORA. Vidu, built on the Universal Vision Transformer (UViT) architecture, combines diffusion and Transformer models to produce more coherent and accurate videos. The tool can generate a 16-second 180p clip with one click and is claimed to be on par with SORA, despite some inconsistencies in video generation. Vidu's technology was first proposed in September 2022, predating SORA's. Interested users can apply for access on shangu.ai.com, contributing to the global AI competition and offering an alternative to existing video generators like Runway and Pika.
Takeaways
- π A Chinese company, Shu, has announced a new AI video generator named Vidu, positioning it as a competitor to SORA.
- π₯ Vidu can generate a 16-second 180p video clip with a single click, showcasing its efficiency in video generation.
- π€ Built on a self-developed architecture called Universal Vision Transformer (UVIT), Vidu integrates diffusion and Transformer AI models.
- π The combination of diffusion and Transformer models is seen as an advancement in generative AI, potentially overcoming previous limitations.
- π The Transformer model, known for its context understanding capabilities, is expected to enhance the coherence and accuracy of generated content.
- π The core technology of UVIT was initially proposed by Vidu's research team in September 2022, prior to SORA's model architecture.
- π In a direct comparison, Vidu's video quality appears to be competitive, though not necessarily on par with SORA, which is yet to be publicly released.
- π Vidu demonstrates a strong ability to generate realistic hands and other detailed elements in its videos.
- π Despite some inconsistencies noticed in the generated videos, Vidu's results are considered notably better than current alternatives like Runway and Pika.
- π Interested users can apply to use Vidu through the company's website, shangu.ai, where they can leave their contact information for access.
- π China has recently been active in the AI space, with new advancements in AI models and robots, indicating a global competition in AI technology.
Q & A
What is the name of the AI video generator announced by the Chinese company Shu?
-The AI video generator announced by the Chinese company Shu is called 'Vidu'.
What is the core technology behind Vidu's video generation?
-Vidu's core technology is based on a self-developed visual transformation model architecture called Universal Vision Transformer (UViT), which integrates two text video AI models of diffusion and Transformer.
What are the potential advantages of merging a Transformer model with a diffusion model?
-Merging a Transformer model with a diffusion model is considered the next step in generative AI, potentially allowing for more coherent and accurate video or image generation due to the Transformer's ability to understand context.
How does Vidu compare to other video generators in terms of generating hands?
-Vidu generates hands very well, with five fingers on each hand and a realistic appearance, which is a significant improvement over other video generators.
What is the output resolution of Vidu's video generation?
-Vidu can output 1080p resolution videos, although the showreel provided in the transcript is only in 720p.
How can one apply to use Vidu's AI video generator?
-To apply to use Vidu's AI video generator, one can visit shangu.ai.com, scroll down to the video generation section, and fill out the form with their name, phone number, and company name.
What is the significance of the Universal Vision Transformer (UViT) in the context of AI video generation?
-The Universal Vision Transformer (UViT) is significant because it combines the advantages of diffusion and Transformer models, potentially leading to more coherent and realistic video generation.
How does Vidu's video generation compare to SORA's in terms of realism?
-While Vidu's showreel seems to produce high-quality and realistic videos, it is not yet on par with SORA, which has not been released for public use but is expected to set a high standard.
What are some of the limitations observed in Vidu's video generation compared to SORA?
-Some limitations observed in Vidu's video generation compared to SORA include inconsistencies in elements like hair transforming into a red ribbon and a green leaf disappearing, which were not seen in SORA's examples.
What are some recent advancements in AI from China that have been mentioned in the script?
-Recent advancements in AI from China mentioned in the script include the launch of SORA, the release of SenseNova 5.0, and the unveiling of the S1 robot by ASOT.
What is the significance of competition in the AI video generation space?
-Competition in the AI video generation space is significant as it drives innovation, improves technology, and can lead to better products and services for consumers.
Outlines
π Introduction to SORA Competitor: VD's AI Video Generator
The video script introduces a new AI video generator developed by a Chinese company called Shu, which claims to be a competitor to OpenAI's SORA. The tool, named VD, is capable of generating a 16-second 180p video clip with a single click. It is built on a novel architecture called Universal Vision Transformer (UViT), which combines the strengths of both diffusion and Transformer models. This integration is seen as a significant advancement in generative AI, potentially overcoming the limitations of previous models like stable diffusion. The script also mentions that the core technology of UViT was first proposed by VDU's research team, predating SORA's model architecture. The video includes a showreel to demonstrate the capabilities of VD, suggesting that it could be a strong contender in the AI video generation space.
π Comparative Analysis of VD and SORA's AI Video Generation
This paragraph presents a side-by-side comparison between VD's AI video generator and OpenAI's SORA. The script highlights that while VD's showreel is impressive and outperforms current video generators like Runway and Pika, it may not yet match the quality of SORA, which has yet to be released for public use. The comparison includes specific examples where VD's videos show minor inconsistencies, such as a green leaf disappearing or a wooden toy ship on a carpet being misinterpreted. Despite these flaws, VD's ability to generate realistic hands and faces is noted. The video also addresses the resolution difference between VD's 720p showcase and SORA's full HD videos, suggesting that the lower resolution may affect the perceived quality of VD's output. The paragraph concludes with a call to action for viewers to apply for access to VD through its website, shangu.ai.com.
π Global AI Developments and the Emergence of VDU
The final paragraph of the script shifts focus to the broader landscape of AI advancements, particularly highlighting recent developments from China. It mentions the launch of a new language model by a Chinese company and the unveiling of a high-speed robot, emphasizing the rapid pace of AI innovation beyond the major tech giants in the United States. The script suggests that the emergence of VDU's AI video generator, along with these other AI products, indicates that China is a significant player in the global AI race. The video concludes by encouraging viewers to share their thoughts on VDU and whether they believe it can compete with or surpass SORA. It also invites viewers to apply for access to VDU and to stay tuned for more content on the topic.
Mindmap
Keywords
π‘AI Video Generator
π‘SORA
π‘Universal Vision Transformer (UViT)
π‘Diffusion Model
π‘Transformer Model
π‘Generative AI
π‘Runway and Pika
π‘Resolution
π‘Consistency
π‘Competition
Highlights
Chinese company Shu announces a SORA competitor, an AI video generator called Vidu.
Vidu claims to be on par with OpenAI's Sora in video generation capabilities.
Vidu can generate a 16-second 180p video clip with one click.
Built on a self-developed architecture called Universal Vision Transformer (UViT).
UViT integrates diffusion and Transformer models, a significant step in generative AI.
Diffusion model limitations include poor text generation and lack of context understanding.
Merging Transformer with diffusion model aims to enhance coherence and accuracy.
The Transformer model is based on the influential 'Attention is All You Need' paper by Google DeepMind.
Ju Jun, Vice Dean of Tsinghua University's Institute of AI, comments on the alignment with Sora's technical roadmap.
Vidu's UViT core technology was proposed before Sora's model architecture.
Vidu showcased videos that outcompete current video generators like Runway and Pika.
Comparisons between Vidu and Sora's videos show mixed results in quality and realism.
Vidu's video resolution is 720p, which is lower than Sora's full HD.
Vidu can output 1080p videos, as reported by Global Times.
The application process for using Vidu is available on shangu.ai.com.
China has been releasing innovative AI products, indicating a competitive landscape in AI development.
The presence of Vidu as a competitor is welcomed, as it drives innovation and improvement in AI video generation.