Unified-IO

Share to:

A new general-purpose model with unprecedented breadth, Unified-IO can perform a wide array of visual and linguistic tasks.

Image Captioning

Region Captioning

Image Generation

Image Captioning

Visual Common Sense

Detection

Segmentation-based Generation

Segmentation

Segmentation-based Generation

Detection

Segmentation-based Generation

Region Captioning

Detection

Segmentation-based Generation

Detection

Image Inpainting

Image Captioning

Visual Common Sense

Image Generation

Depth Estimation

Region Captioning

Image Captioning

Visual Question Answering

Depth Estimation

Pose Estimation

Detection

Image Inpainting

Image Generation

Segmentation-based Generation

Visual Common Sense

Segmentation-based Generation

Region Captioning

Image Generation

Segmentation-based Generation

Image Generation

Visual Question Answering

Image Captioning

Segmentation

Image Inpainting

Visual Question Answering

Detection

Image Generation

Segmentation

Image Captioning

Region Captioning

Image Inpainting

Segmentation-based Generation

Surface Normals

Detection

Pose Estimation

Visual Common Sense

Segmentation-based Generation

Pose Estimation

Visual Question Answering

Pose Estimation

Surface Normals

Image Generation

Segmentation-based Generation

Image Inpainting

Image Generation

Detection

Segmentation-based Generation

Image Captioning

Image Generation

Region Captioning

Surface Normals

Segmentation-based Generation

Image Generation

Segmentation-based Generation

Segmentation

Pose Estimation

Region Captioning

Image Captioning

Region Captioning

Detection

Visual Question Answering

Image Inpainting

Visual Common Sense

Image Inpainting

Image Generation

Image Inpainting

Visual Common Sense

Image Captioning

Depth Estimation

Image Generation

Region Captioning

Visual Common Sense

Image Inpainting

Region Captioning

Depth Estimation

Visual Common Sense

Surface Normals

Visual Question Answering

Image Captioning

Detection

Region Captioning

Segmentation

Image Captioning

Depth Estimation

Image Captioning

How Does Unified-IO Work?

Unified-IO is the first neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP). Unified-IO achieves this broad unification by homogenizing every task's input and output into a sequence of tokens drawn from a discrete and finite vocabulary. Dense inputs such as images, masks, and depth maps are converted to sequences using a universal compressor, and sparse structured inputs such as bounding boxes and human joint locations are transcribed into language, which is naturally sequential.

This approach of unifying input and output data enables us to train a single sequence-to-sequence Unified IO model to perform tasks across more than 80 diverse computer vision and NLP benchmarks.

Unified-IO is a significant milestone in the pursuit of a single unified general purpose system capable of parsing and producing visual, linguistic, and other structured data. Read more about Unified-IO .

Unified-IO: Sequential Modeling for Generally Applicable Vision Models

View Unified-IO Source Code on GitHub

Image + Text to Image

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding visual output. Choose a type of task to explore these capabilities.

Depth Estimation Detection Image Inpainting Pose Estimation Segmentation Segmentation-based Generation Surface Normals

Depth Estimation

Estimate the relative depth of different objects in the image

What is the depth map of the image?

Image + Text to Text

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding text output. Choose a type of task to explore these capabilities.

Image Captioning Region Captioning Visual Common Sense Visual Question Answering

Image Captioning

Provide a caption that summarizes an image

What does the image describe?

a black bike parked next to a bed

Text to Image

In addition to describing or manipulating images given the image and instructions, Unified-IO can also generate entirely new images based on textual descriptions. Choose a prompt below to see the images Unified-IO can create based on text inputs.

Image Generation

Create a new image from a text description

Generate an image of "small personal pizza with bacon and spinach".

Try another example:

Generate an image of "small personal pizza with bacon and spinach".

Text to Text

In addition to the many image understanding and synthesis tasks shown above, Unified-IO can also perform many different natural language processing (NLP) tasks. Choose a prompt below to see how Unified-IO performs several standard NLP tasks.

Paraphrase Detection Question Answering Textual Entailment

Paraphrase Detection

Determine whether one statement correctly rephrases another

* Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia's camp, when the mudslide smashed into two cabins. * Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp, a Greek Orthodox facility, when the mudslide roared through.

Yes, they are equivalent.

Try another example:

Credits

This demo page is built and maintained by PRIOR and colleagues at the Allen Insitute for AI. Our team seeks to advance computer vision to create AI systems that see, explore, learn, and reason about the world.

This research was made possible with cloud TPUs from Google’s TPU Research Cloud (TRC).

Learn more about Unified-IO and the PRIOR team on the AI2 Blog. Follow @allenai_ai on Twitter and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.

Research Team

Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Ani Kembhavi

Demo Design + Development

Sam Stuesser
Sam Skjonsberg
Jon Borchardt
Carissa Schoenick
Michael Schmitz

Unified-IO IconUnified-IO

A new general-purpose model with unprecedented breadth, Unified-IO can perform a wide array of visual and linguistic tasks.

How Does Unified-IO Work?

Image + Text to Image

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding visual output. Choose a type of task to explore these capabilities.

Depth Estimation

Image + Text to Text

Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding text output. Choose a type of task to explore these capabilities.

Image Captioning

Text to Image

In addition to describing or manipulating images given the image and instructions, Unified-IO can also generate entirely new images based on textual descriptions. Choose a prompt below to see the images Unified-IO can create based on text inputs.

Image Generation

Text to Text

In addition to the many image understanding and synthesis tasks shown above, Unified-IO can also perform many different natural language processing (NLP) tasks. Choose a prompt below to see how Unified-IO performs several standard NLP tasks.

Paraphrase Detection

Credits

Research Team

Demo Design + Development

Unified-IO