Tutorials

How to run Wan 2.2 open-source video generation model on a cloud GPU

Learn how to self-host the leading open source AI video generation model, Wan 2.2 on an H100 GPU.
Deepak Manoor
Posted : August, 29, 2025
Posted : August, 29, 2025
    Wan2.2 H100 GPU

    Video has become a central medium for communication, entertainment, and sharing ideas, and AI is making video generation faster and more accessible. Open-source models are advancing rapidly, narrowing the gap with, and in some cases surpassing proprietary systems. Tencent’s Hunyuan and Alibaba’s Wan (Tongyi Wanxiang) series are prime examples, with the latest release, Wan 2.2, standing out as the best open-source AI video generator currently available. Built on a Mixture-of-Experts diffusion architecture, it combines cinematic control with high-fidelity aesthetics and has outperformed popular models such as OpenAI Sora, Pika 2.2, Runway 3, and Luma Ray 2.

    Here’s a brief overview of Wan 2.2 specifications:

    Alibaba Wan 2.2
    ArchitectureDiffusion model with a two-expert Mixture of Experts (MoE): a high-noise expert plans global layout in early steps, and a low-noise expert refines details later
    Model VariantsText-to-video: Wan2.2-T2V-A14B (MoE)Text-image-to-video: Wan2.2-TI2V-5B (High-compression VAE, Hybrid T2V+I2V)Image-to-video: Wan2.2-I2V-A14B (MoE)Speech-to-video: Wan2.2-S2V-14B
    Resolution and Frame Rate720p, 24 fps (16x9 ratio)
    LicenseApache 2.0

    Wan 2.2. is currently the highest-performing open-weights video generation model on the Artificial Analysis leaderboard.

    Best AI video generators

    Source: Artificial Analysis

    Benchmarks shared by the Wan video team also indicate high scores in comparison with other SOTA models such as OpenAI Sora, Seedance 1.0 and Kling 2.0

    Wan 2.2 Performance Benchmarks

    Source: Wan Inc

    How to run Wan 2.2 on H100 GPUs

    Pre-requisites to self-host Wan 2.2

    Create a GPU virtual machine (VM) on Ori Global Cloud. We recommend using NVIDIA H100 GPUs to reduce video generation time. With a single H100 GPU, generating a 720p video takes 20-25 minutes, however using a setup with multiple H100 GPUs can dramatically speed up the video generation process.

    Quick Tip

    Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

    Step 1: SSH into your VM, and create a virtual environment

    Bash/ShellCopy
    1apt install python3.12-venv
    2python3.12 -m venv wan-env
    3source .wan-env/bin/activate

    Step 2: Clone the github repository

    Bash/ShellCopy
    1git clone https://github.com/Wan-Video/Wan2.2.git
    2cd Wan2.2

    Step 3: Install dependencies

    Bash/ShellCopy
    1pip install -r requirements.txt

    Step 4: If you run into errors installing flash attention, run these packages and run the requirements command again

    Bash/ShellCopy
    1pip install torch torchvision torchaudio
    2pip install packaging
    3pip install --upgrade psutil
    4pip install ninja
    5pip install flash-attn --no-build-isolation

    Step 5: We installed Jupyter to make it easy to run the prompts and download the video files

    Bash/ShellCopy
    1uv pip install notebook
    2jupyter notebook --allow-root --no-browser --ip=0.0.0.0

    Step 6: Download the model with Huggingface command line interface (CLI)

    Bash/ShellCopy
    1pip install "huggingface_hub[cli]"

    Text-to-video (T2V):

    Bash/ShellCopy
    1huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

    Image-to-video (I2V):

    Bash/ShellCopy
    1huggingface-cli download Wan-AI/Wan2.2-I2V-A14B --local-dir ./Wan2.2-I2V-A14B

    Speech-to-video (S2V):

    Bash/ShellCopy
    1huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./Wan2.2-S2V-14B

    Note: There is also a 5B hybrid model that combines T2V and I2V capabilities. You can find the instructions to download in this model card.

    Step 7: Run video generation

    Text-to-video (T2V):

    Bash/ShellCopy
    1python generate.py  --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --offload_model True --convert_model_dtype --prompt "A large, majestic dragon with olive green scales and flaming red eyes is set against the backdrop of a serene, snowy valley. The video begins with a tracking shot of the valley and then the camera zooms in on the dragon highlighting the details on its face. To maintain visual clarity of this video, every element within the frame is crisp and discernible."

    Image-to-video (I2V):

    Bash/ShellCopy
    1python generate.py --task i2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-I2V-A14B --offload_model True --convert_model_dtype --image ./servers.png --prompt "Turn the image into smooth first person view (FPV) footage"

    Note: The speech-to-video model which turns an image into a video with the help of an input audio file, took a long time to finish during our model testing and could not integrate the audio well.

    If you’d like to use ComfyUI on a cloud GPU to generate images with Wan 2.2, check out our Genmo Mochi 1 tutorial that uses ComfyUI. The workflow download instructions are available here.

    How good is Wan 2.2

    We found Wan 2.2 to be the best open-weights video generation model currently available. From top-notch aesthetics to flexible camera control and high-fidelity content, Wan 2.2 impressed us both in terms of text-to-video and image-to-video generation capabilities. The model showcases excellent prompt understanding and can generate videos with realistic facial expressions and fairly accurate anatomy understanding. However, the model did struggle with text rendering, especially longer and punctuated sentences. The model showcases excellent prompt understanding and can generate videos with realistic facial expressions and accurate anatomy understanding.

    Here are a couple of video rolls from our model testing

    Text-to-video:

    Image-to-Video:

    Chart your AI reality on Ori

    Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications

    Share