Video has become a central medium for communication, entertainment, and sharing ideas, and AI is making video generation faster and more accessible. Open-source models are advancing rapidly, narrowing the gap with, and in some cases surpassing proprietary systems. Tencent’s Hunyuan and Alibaba’s Wan (Tongyi Wanxiang) series are prime examples, with the latest release, Wan 2.2, standing out as the best open-source AI video generator currently available. Built on a Mixture-of-Experts diffusion architecture, it combines cinematic control with high-fidelity aesthetics and has outperformed popular models such as OpenAI Sora, Pika 2.2, Runway 3, and Luma Ray 2.

Here’s a brief overview of Wan 2.2 specifications:

	Alibaba Wan 2.2
Architecture	Diffusion model with a two-expert Mixture of Experts (MoE): a high-noise expert plans global layout in early steps, and a low-noise expert refines details later
Model Variants	Text-to-video: Wan2.2-T2V-A14B (MoE)Text-image-to-video: Wan2.2-TI2V-5B (High-compression VAE, Hybrid T2V+I2V)Image-to-video: Wan2.2-I2V-A14B (MoE)Speech-to-video: Wan2.2-S2V-14B
Resolution and Frame Rate	720p, 24 fps (16x9 ratio)
License	Apache 2.0

Wan 2.2. is currently the highest-performing open-weights video generation model on the Artificial Analysis leaderboard.

Source: Artificial Analysis

Benchmarks shared by the Wan video team also indicate high scores in comparison with other SOTA models such as OpenAI Sora, Seedance 1.0 and Kling 2.0

Source: Wan Inc

How to run Wan 2.2 on H100 GPUs

Pre-requisites to self-host Wan 2.2

Create a GPU virtual machine (VM) on Ori Global Cloud. We recommend using NVIDIA H100 GPUs to reduce video generation time. With a single H100 GPU, generating a 720p video takes 20-25 minutes, however using a setup with multiple H100 GPUs can dramatically speed up the video generation process.

Quick Tip

Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

Step 1: SSH into your VM, and create a virtual environment

Bash/ShellCopy

1apt install python3.12-venv
2python3.12 -m venv wan-env
3source .wan-env/bin/activate

Step 2: Clone the github repository

Bash/ShellCopy

1git clone https://github.com/Wan-Video/Wan2.2.git
2cd Wan2.2

Step 3: Install dependencies

Bash/ShellCopy

1pip install -r requirements.txt

Step 4: If you run into errors installing flash attention, run these packages and run the requirements command again

Bash/ShellCopy

1pip install torch torchvision torchaudio
2pip install packaging
3pip install --upgrade psutil
4pip install ninja
5pip install flash-attn --no-build-isolation

Step 5: We installed Jupyter to make it easy to run the prompts and download the video files

Bash/ShellCopy

1uv pip install notebook
2jupyter notebook --allow-root --no-browser --ip=0.0.0.0

Step 6: Download the model with Huggingface command line interface (CLI)

Bash/ShellCopy

1pip install "huggingface_hub[cli]"

Text-to-video (T2V):

Bash/ShellCopy

1huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

Image-to-video (I2V):

Bash/ShellCopy

1huggingface-cli download Wan-AI/Wan2.2-I2V-A14B --local-dir ./Wan2.2-I2V-A14B

Speech-to-video (S2V):

Bash/ShellCopy

1huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./Wan2.2-S2V-14B

Note: There is also a 5B hybrid model that combines T2V and I2V capabilities. You can find the instructions to download in this model card.

Step 7: Run video generation

Text-to-video (T2V):

Bash/ShellCopy

1python generate.py  --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --offload_model True --convert_model_dtype --prompt "A large, majestic dragon with olive green scales and flaming red eyes is set against the backdrop of a serene, snowy valley. The video begins with a tracking shot of the valley and then the camera zooms in on the dragon highlighting the details on its face. To maintain visual clarity of this video, every element within the frame is crisp and discernible."

Image-to-video (I2V):

Bash/ShellCopy

1python generate.py --task i2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-I2V-A14B --offload_model True --convert_model_dtype --image ./servers.png --prompt "Turn the image into smooth first person view (FPV) footage"

Note: The speech-to-video model which turns an image into a video with the help of an input audio file, took a long time to finish during our model testing and could not integrate the audio well.

If you’d like to use ComfyUI on a cloud GPU to generate images with Wan 2.2, check out our Genmo Mochi 1 tutorial that uses ComfyUI. The workflow download instructions are available here.

How good is Wan 2.2

We found Wan 2.2 to be the best open-weights video generation model currently available. From top-notch aesthetics to flexible camera control and high-fidelity content, Wan 2.2 impressed us both in terms of text-to-video and image-to-video generation capabilities. The model showcases excellent prompt understanding and can generate videos with realistic facial expressions and fairly accurate anatomy understanding. However, the model did struggle with text rendering, especially longer and punctuated sentences. The model showcases excellent prompt understanding and can generate videos with realistic facial expressions and accurate anatomy understanding.

Here are a couple of video rolls from our model testing

Text-to-video:

Image-to-Video:

Chart your AI reality on Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications

Deploy Private Clouds for flexible and secure enterprise AI.
Leverage GPU Instances as on-demand virtual machines.
Operate Inference Endpoints effortlessly at any scale.
Scale GPU Clusters for training and inference.
Manage AI workloads on Serverless Kubernetes without infrastructure overhead.

Build the next big thing in AI on Ori

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.

How to run Wan 2.2 open-source video generation model on a cloud GPU

How to run Wan 2.2 on H100 GPUs

Pre-requisites to self-host Wan 2.2

Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

How good is Wan 2.2

Text-to-video:

Image-to-Video:

Chart your AI reality on Ori

Build limitless AI on Ori