How Does Unified-IO Work?
Unified-IO is the first neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP). Unified-IO achieves this broad unification by homogenizing every task's input and output into a sequence of tokens drawn from a discrete and finite vocabulary. Dense inputs such as images, masks, and depth maps are converted to sequences using a universal compressor, and sparse structured inputs such as bounding boxes and human joint locations are transcribed into language, which is naturally sequential.
This approach of unifying input and output data enables us to train a single sequence-to-sequence Unified IO model to perform tasks across more than 80 diverse computer vision and NLP benchmarks.
Unified-IO is a significant milestone in the pursuit of a single unified general purpose system capable of parsing and producing visual, linguistic, and other structured data. Read more about Unified-IO .
Image + Text to Image
Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding visual output. Choose a type of task to explore these capabilities.
Image + Text to Text
Unified-IO can understand complex inputs that include images, image annotations such as bounding boxes, and text instructions and produces the corresponding text output. Choose a type of task to explore these capabilities.
Text to Image
In addition to describing or manipulating images given the image and instructions, Unified-IO can also generate entirely new images based on textual descriptions. Choose a prompt below to see the images Unified-IO can create based on text inputs.
Image Generation
Create a new image from a text descriptionText to Text
In addition to the many image understanding and synthesis tasks shown above, Unified-IO can also perform many different natural language processing (NLP) tasks. Choose a prompt below to see how Unified-IO performs several standard NLP tasks.
Paraphrase Detection
Determine whether one statement correctly rephrases anotherCredits
This demo page is built and maintained by PRIOR and colleagues at the Allen Insitute for AI. Our team seeks to advance computer vision to create AI systems that see, explore, learn, and reason about the world.
This research was made possible with cloud TPUs from Google’s TPU Research Cloud (TRC).
Learn more about Unified-IO and the PRIOR team on the AI2 Blog. Follow @allenai_ai on Twitter and subscribe to the AI2 Newsletter to stay current on news and research coming out of AI2.
Research Team
- Jiasen Lu
- Christopher Clark
- Rowan Zellers
- Roozbeh Mottaghi
- Ani Kembhavi
Demo Design + Development
- Sam Stuesser
- Sam Skjonsberg
- Jon Borchardt
- Carissa Schoenick
- Michael Schmitz