Make money with Oziconnect referral program
better programming
Figure 1: Stable diffusion image results for a waterfall image and the text prompt “Mars Waterfalls”. Runs on Intel’s GPU Max 1100 – Author Prompts

Stable diffusion models have become a great way for creators, artists, and designers to rapidly prototype visual ideas without outside help. If you have used stable diffusion models, you may be familiar with displaying text prompts to generate images.

Some models can use images as a starting point to generate text prompts and images. This article describes how we performed image-to-image stable diffusion model prediction on Intel’s just-released Intel® Data Center GPU Max 1100.

We ran two different SD models for image-to-image generation. Both were hosted on Hugging Face. These models are primarily used for text-to-image conversion, but both work equally well for image-to-image conversion.

  1. Stability AI Stable Diffusion v2–1
  2. Runway ML Stable Spread v1–5

Stability AI Stable Diffusion v2–1 models were trained on a superior cluster of 32 x 8 x A100 GPUs (256 GPU cards total). Tweaked from the Stable Diffusion v2 model.

The original dataset was a subset of the LAION-5B dataset created by the DeepFloyd team at Stability AI. The LAION-5B dataset is the largest text-image pair dataset to date at the time of this writing, containing over 5.85 billion text-image pairs. Figure 4 shows some samples from the dataset.

Figure 4: Sample from the LAION-5B dataset for the cat example.image source

From the sample images shown, you can see that the original images have different pixel sizes. However, actually training these models typically requires padding or resizing the images to maintain a constant pixel size for the model architecture.

The breakdown of the dataset is as follows:

  • laion2B-en: 2.32 billion English text and image pairs
  • laion2B-multi: 2.26 billion text and image pairs from over 100 languages
  • laion1B-nolang 1.27 billion text-image pairs with undetectable languages

Following the training path for these models is a bit complicated, but here’s the full story:

  • Stable Diffusion 2-Base was trained from scratch with 550K steps on 256×256 pixel images, filtered for pornographic material, and then trained with an additional 850K steps on 512×512 pixel images.
  • Stable Diffusion v2 resumed training where Stable Diffusion 2-Base left off and was trained with an additional 150K steps on 512×512 pixel images, followed by another 140K steps on 768×768 pixel images.
  • Stability AI Stable Diffusion v2–1 was further fine-tuned from Stable Diffusion v2, first in 55K steps and then in 155K steps using two different explicit material filters.

For training details, see the Stability AI Stable Diffusion v2–1 Hugging Face model card here. I would like to mention that I have repeated this explanation since it is the same model as in my previous article on stable diffusion from text to images.

The Runway ML model was fine-tuned from the Stability AI v2–1 model mentioned above. An additional 595k steps of training were performed at a resolution of 512×512. One of its advantages is that it is relatively lightweight. “With 860M UNet and 123M text encoder, this model is relatively lightweight and runs on a GPU with at least 10 GB VRAM.” The Max 1100 GPU has 48 GB of VRAM, which is plenty for this model. is.

Intel GPU hardware

As mentioned earlier, the GPU used for the inference tests is an Intel Data Center GPU Max 1100 with 48 GB of memory, 56 Xe cores, and 300 W of thermal design power.

At the command line, you can first verify that you actually have the GPU you expect by running the following command:

clinfo -l

You should see output that shows you have access to four Intel GPUs on the current node.

Platform #0: Intel(R) OpenCL Graphics
+-- Device #0: Intel(R) Data Center GPU Max 1100
+-- Device #1: Intel(R) Data Center GPU Max 1100
+-- Device #2: Intel(R) Data Center GPU Max 1100
`-- Device #3: Intel(R) Data Center GPU Max 1100

similar to nvidia-smi When I run the function, xpu-smi Select a few options on the command line to get the statistics you want about GPU usage.

xpu-smi dump -d 0 -m 0,5,18

As a result, the device’s critical GPU usage will be printed. 0every second:

getpwuid error: Success
Timestamp, DeviceId, GPU Utilization (%), GPU Memory Utilization (%), GPU Memory Used (MiB)
13:34:51.000, 0, 0.02, 0.05, 28.75
13:34:52.000, 0, 0.00, 0.05, 28.75
13:34:53.000, 0, 0.00, 0.05, 28.75
13:34:54.000, 0, 0.00, 0.05, 28.75

Performing a stable diffusion inter-image sample yourself

My colleague Rahul Nair has created a Stable Diffusion image-to-image Jupyter notebook hosted directly on the Intel Developer Cloud. You are given the option of using one of the models previously discussed. Here are the steps you can take to get started:

  1. Click here to register as a standard user in Intel Developer Cloud.
  2. Once logged in, go to the “Training and Workshops” section.
  3. Click on the GenAI Launch Jupyter Notebook option and you will find the Text to Image Stable Diffusion notebook where you can run it.

The notebook used Intel® Extension for PyTorch* to speed up inference. One of the important features is _optimize_pipeline where ipex.optimize is called to optimize DiffusionPipeline object.

    def _optimize_pipeline(
self, pipeline: StableDiffusionImg2ImgPipeline
) -> StableDiffusionImg2ImgPipeline:
"""
Optimize the pipeline of the model.

Args:
pipeline (StableDiffusionImg2ImgPipeline): The pipeline to optimize.

Returns:
StableDiffusionImg2ImgPipeline: The optimized pipeline.
"""
for attr in dir(pipeline):
if isinstance(getattr(pipeline, attr), nn.Module):
setattr(
pipeline,
attr,
ipex.optimize(
getattr(pipeline, attr).eval(),
dtype=pipeline.text_encoder.dtype,
inplace=True,
),
)
return pipeline

Figure 5 shows a convenient mini user interface within Jupyter Notebook itself for generating images from images. You can start creating your own images by selecting one of the models, entering the desired image URL, completing the prompts, and choosing the number of images you want to generate.

Figure 5: Mini user interface of the image-to-image interface in Jupyter Notebook. Image by author.

Figures 1 and 2 show sample results with brand new images from a text + image prompt run on this Intel GPU. I thought it would be great to start with an actual terrestrial nature photo of a waterfall and have it create a “Martian waterfall” to see how the red color adapts to the landscape (Figure 1).

Next, in Figure 2, the model transformed an image of Jupiter to have Earth’s continental structure, but some of Jupiter’s unique features were still retained and colored red.

Figure 2: Stable diffusion image results for an image of the planet Jupiter and a text prompt for “Earth”. Runs on Intel’s latest Data Center GPU Max 1100. Image by author.

These images can be generated through notebooks and inference is performed in seconds. Connect with me below and share your images with me on social. Also, let us know if you have any questions or need help getting started with Stable Diffusion.

Stable diffusion models are powerful tools for high-resolution image synthesis, including text-to-image and image-to-image conversion. Although these are designed to produce high-quality results, users should be aware of potential limitations.

  • Variation in quality: The quality of the generated images can vary based on the complexity of the input text or image and the consistency of the model with the training data.
  • License and Usage Restrictions: Please carefully review the license information associated with each model to ensure compliance with all terms and conditions.
  • Ethical considerations: Consider the ethical implications of the content produced, especially in contexts that may include sensitive or controversial subject matter.

Please refer to each model card for detailed information on each model’s features, limitations, and best practices.

Make money with Oziconnect referral program
Make money with Oziconnect referral program
Make money with Oziconnect referral program
Make money with Oziconnect referral program
84512

About Us

We are a leading IT agency in Lagos, Nigeria, providing IT consulting and custom software development services. We offer a wide range of IT solutions across software development, web and mobile application development, blockchain development services, digital marketing, and branding.

Contact Us

25B Lagos-Abekouta Expressway Lagos

info@ozitechgroup.com

Phone: (234) 907 155 5545

@2023 OzitechGroup – All Right Reserved.