Unified-IO IconUnified-IO

Share to:TwitterLinkedIn

A new general-purpose model with unprecedented breadth, Unified-IO can perform a wide array of visual and linguistic tasks.

Image Captioning
Region Captioning
Image Generation
Image Captioning
Visual Common Sense
Detection
Segmentation-based Generation
Segmentation
Segmentation-based Generation
Detection
Segmentation-based Generation
Region Captioning
Detection
Segmentation-based Generation
Detection
Image Inpainting
Image Captioning
Visual Common Sense
Image Generation
Depth Estimation
Region Captioning
Image Captioning
Visual Question Answering
Depth Estimation
Pose Estimation
Detection
Image Inpainting
Image Generation
Segmentation-based Generation
Segmentation-based Generation
Visual Common Sense
Segmentation-based Generation
Region Captioning
Image Generation
Segmentation-based Generation
Image Generation
Visual Question Answering
Image Captioning
Segmentation
Image Inpainting
Visual Question Answering
Detection
Image Generation
Segmentation
Image Captioning
Region Captioning
Image Inpainting
Segmentation-based Generation
Surface Normals
Surface Normals
Detection
Pose Estimation
Visual Common Sense
Segmentation-based Generation
Pose Estimation
Visual Question Answering
Pose Estimation
Surface Normals
Image Generation
Segmentation-based Generation
Image Inpainting
Image Generation
Detection
Segmentation-based Generation
Image Captioning
Image Generation
Region Captioning
Surface Normals
Segmentation-based Generation
Image Generation
Image Generation
Segmentation-based Generation
Segmentation
Pose Estimation
Pose Estimation
Region Captioning
Region Captioning
Image Captioning
Region Captioning
Region Captioning
Detection
Visual Question Answering
Image Inpainting
Visual Common Sense
Image Inpainting
Image Generation
Image Inpainting
Visual Common Sense
Image Captioning
Depth Estimation
Image Generation
Region Captioning
Visual Common Sense
Visual Common Sense
Visual Common Sense
Image Inpainting
Image Inpainting
Region Captioning
Depth Estimation
Visual Common Sense
Surface Normals
Visual Question Answering
Image Captioning
Detection
Region Captioning
Segmentation
Image Captioning
Depth Estimation
Image Captioning

How Does Unified-IO Work?

Unified-IO is the first neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP). Unified-IO achieves this broad unification by homogenizing every task's input and output into a sequence of tokens drawn from a discrete and finite vocabulary. Dense inputs such as images, masks, and depth maps are converted to sequences using a universal compressor, and sparse structured inputs such as bounding boxes and human joint locations are transcribed into language, which is naturally sequential.

This approach of unifying input and output data enables us to train a single sequence-to-sequence Unified IO model to perform tasks across more than 80 diverse computer vision and NLP benchmarks.

Unified-IO is a significant milestone in the pursuit of a single unified general purpose system capable of parsing and producing visual, linguistic, and other structured data. Read more about Unified-IO  .

Image + Text to Image

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding visual output. Choose a type of task to explore these capabilities.

Depth Estimation

Estimate the relative depth of different objects in the image
What is the depth map of the image?
Unified-IO Icon

Image + Text to Text

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding text output. Choose a type of task to explore these capabilities.

Image Captioning

Provide a caption that summarizes an image
What does the image describe?
Unified-IO Icon
a black bike parked next to a bed

Text to Image

In addition to describing or manipulating images given the image and instructions, Unified-IO can also generate entirely new images based on textual descriptions. Choose a prompt below to see the images Unified-IO can create based on text inputs.

Image Generation

Create a new image from a text description
Generate an image of "small personal pizza with bacon and spinach".
Unified-IO Icon
Generate an image of "small personal pizza with bacon and spinach".

Text to Text

In addition to the many image understanding and synthesis tasks shown above, Unified-IO can also perform many different natural language processing (NLP) tasks. Choose a prompt below to see how Unified-IO performs several standard NLP tasks.

Paraphrase Detection

Determine whether one statement correctly rephrases another
* Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia's camp, when the mudslide smashed into two cabins. * Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp, a Greek Orthodox facility, when the mudslide roared through.
Unified-IO Icon
Yes, they are equivalent.
* Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia's camp, when the mudslide smashed into two cabins. * Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp, a Greek Orthodox facility, when the mudslide roared through.

Credits

This demo page is built and maintained by PRIOR and colleagues at the Allen Insitute for AI. Our team seeks to advance computer vision to create AI systems that see, explore, learn, and reason about the world.

This research was made possible with cloud TPUs from Google’s TPU Research Cloud (TRC).

Learn more about Unified-IO and the PRIOR team on the AI2 Blog. Follow @allenai_ai on Twitter and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.

Research Team

  • Jiasen Lu
  • Christopher Clark
  • Rowan Zellers
  • Roozbeh Mottaghi
  • Ani Kembhavi

Demo Design + Development

  • Sam Stuesser
  • Sam Skjonsberg
  • Jon Borchardt
  • Carissa Schoenick
  • Michael Schmitz