Posted by: Paul Ruiz – Senior Developer Relations Engineer and Kris Tonthat – Technical Writer
At the beginning of this year, Previewed on-device text-to-image generation using Android’s diffusion model via the MediaPipe solution.Today, this is early, experimental A solution for developers to try on Android devices, Image Generator makes it easy to generate images completely on-device in as fast as 15 seconds on high-end devices. I can’t wait to see what you all make!
There are three main ways to use the new MediaPipe Image Generator task.
- Text-to-image generation based on text prompts using standard diffusion models.
- Controllable text-to-image generation based on text prompts and image adjustments using diffuse plugins.
- Customized text-to-image generation based on text prompts using low-rank adaptive (LoRA) weights. This allows you to create predefined images of specific concepts for your own use case.
model
Before we get into all the fun and exciting parts of this new MediaPipe task, it’s important to know that the image generation API supports a model that exactly matches the Stable Diffusion v1.5 architecture. You can use a pre-trained or fine-tuned model by converting it to a model format supported by MediaPipe Image Generator using a conversion script.
You can also customize the base model via Vertex AI’s MediaPipe Diffusion LoRA tweaks, allowing you to inject new concepts into the base model without having to tweak the entire model. For more information on this process, please see the official documentation.
If you would like to try this task today, without customizationwe also provide links to several verified working models within the same document.
Image generation using diffusion model
The easiest way to try out the image generator task is to display a text prompt and receive the resulting image using a diffusion model.
As with any other task in MediaPipe, first, option object. In this case, you only need to define the path to the underlying model file on the device. Once you have the options object, image generator.
|
After creating a new ImageGenerator, you can create new images by passing a prompt, the number of iterations the generator needs to generate, and a seed value. This performs a blocking operation to create the new image, so it must be run in a background thread before returning the new image. bitmap result object.
|
In addition to this simple input/result output format, we also support a method for manually running each iteration. Execute() The function receives intermediate result images at various stages to indicate the progress of the generation. Getting intermediate results is not recommended for most apps due to performance and complexity, but it’s a good way to show what’s going on under the hood. This is a slightly more detailed process, but you can find this demo and other examples shown in this post in the official sample app on GitHub.
Image generation using plugins
The ability to create a new image solely from a prompt on the device is already known. big stepNow we’ve taken it a step further by implementing a new plugin system that allows the diffusion model to accept conditional images and text prompts as input.
Currently, we support three different methods that can provide the basis for generations: facial structure, edge detection, and depth perception. Plugins allow you to provide an image, extract certain structures from it, and use those structures to create new images.
LoRA weight
The third major feature we’re rolling out today is the ability to use LoRA to customize image generator tasks to teach the underlying model new concepts, such as specific objects, people, or styles that are presented during training. The new LoRA weights turn the image generator into a specialized generator that can inject specific concepts into the generated images.
LoRA weights are useful if you want all your images to look like an oil painting, or to display a specific teapot in a setting you’ve created. You can learn more about Vertex AI’s LoRA weights in the MediaPipe Stable Diffusion LoRA model card and create them using this notebook. Once you generate LoRA weights, you can deploy them to your device using the MediaPipe Tasks Image Generator API or for optimized server inference through Vertex AI’s one-click deployment.
In the example below, we created LoRA weights using some images of teapots from the Dreambooth teapot training image set. Then use the weights to generate new images of the teapot with different settings.
next step
This is just the beginning of the features we plan to support with on-device image generation. We look forward to seeing all the great things our developer community builds, so be sure to post on X (officially Twitter) with the hashtag. #MediaPipeImageGen and tag @GoogleDevs. Check out the official samples on GitHub that demonstrate everything you’ve learned so far, read the official documentation to learn more, and stay tuned to the latest updates and tutorials released on the Google for developers YouTube channel. can. media pipe team.
Acknowledgment
We would like to thank all team members who contributed to this work: Lu Wang, Yi-Chun Kuo, Sebastian Schmidt, Kris Tonthat, Jiuqiang Tang, Khanh LeViet, Paul Ruiz, Qifei Wang, Yang Zhao, Yuqi Li, Lawrence Chan, Core Tingbo Hou, Joe Zou, Raman Sarokin, Juhyun Lee, Geng Yan, Ekaterina Ignasheva, Shanthal Vasanth, Glenn Cameron, Mark Sherwood, Andrei Kulik, Chuo-Ling Chang, Matthias Grundmann, and Changyu Zhu from the ML team, Genquan Duan from Google Cloud , Bo Wu, Ting Yu, and Shengyang Dai.