Connect with us

Tech

A friendly guide to local AI image gen with Stable Diffusion and Automatic1111

Published

on

A friendly guide to local AI image gen with Stable Diffusion and Automatic1111

Hands On The launch of Microsoft’s Copilot+ AI PCs brought with it a load of machine-learning-enhanced functionality, including an image generator built right into MS Paint that runs locally and turns your doodles into art.

The only problem is that you’ll need a shiny new Copilot+ AI PC to unlock these features. Well, to unlock Microsoft Cocreate anyway. If you’ve got even a remotely modern graphics card, or even a decent integrated one, you’ve (probably) got everything you need to experiment with AI image-generation locally on your machine.

Since its debut nearly two years ago, Stability AI’s Stable Diffusion models have become the go-to for local image generation, owing to the incredibly compact size, relatively permissive license, and ease of access. Unlike many proprietary models, like Midjourney or OpenAI’s Dall-e, you can download the model and run it yourself.

Because of this, a slew of applications and services have cropped up over the past few years designed to make deploying Stable Diffusion-derived models more accessible on all manner of hardware.

In this tutorial, we’ll be looking at how diffusion models actually work and exploring one of the more popular apps for running them locally on your machine.

Prerequisites:

Automatic1111’s Stable Diffusion Web UI runs an a wide range of hardware and compared to some of our other hands on AI tutorial software it’s not terribly resource-intensive either. Here’s what you’ll need:

  • For this guide you’ll need a Windows or Linux PC (We’re using Ubuntu 24.04 and Windows 11) or an Apple Silicon Mac.
  • A compatible Nvidia or AMD graphics card with at least 4GB of vRAM. Any reasonably modern Nvidia or most 7000-series Radeon graphics cards (some higher-end 6000-series cards may work too) should work without issue. We tested with Nvidia’s Tesla P4, RTX 3060 12G, RTX 6000 Ada Generation, as well as AMD’s RX 7900 XT
  • The latest graphics drivers for your particular GPU.

The basics of diffusion models

Before we jump into deploying and running diffusion models, it’s probably worth taking a high-level look at how they actually work.

In a nutshell, diffusion models have been trained to take random noise and, through a series of denoising steps, arrive at a recognizable image or audio sample that’s representative of a specific prompt.

The process of training these models is also fairly straightforward, at least conceptually. A large catalog of labeled images, graphics, or sometimes audio samples — often ripped from the internet — is imported and increasing levels of noise are applied to them. Over the course of millions, or even billions, of samples the model is trained to reverse this process, going from pure noise to a recognizable image.

During this process both the data and their labels are converted into associated vectors. These vectors serve as a guide during inferencing. Asked for a “puppy playing in a field of grass,” the model will use this information to guide each step of the denoising process toward the desired outcome.

To be clear, this is a gross oversimplification, but it provides a basic overview of how diffusion models are able to generate images. There’s a lot more going on under the hood, and we recommend checking out Computerphile’s Stable Diffusion explainer if you’re interested in learning more about this particular breed of AI model.

Getting started with Automatic1111

Arguably the most popular tool for running diffusion models locally is Automatic1111’s Stable Diffusion Web UI.

Automatic1111’s Stable Diffusion WebUI provides access to a wealth of tools for tuning your AI generated images – Click to enlarge any image

As the name suggests, the app provides a straightforward, self-hosted web GUI for creating AI-generated images. It supports Windows, Linux, and macOS, and can run on Nvidia, AMD, Intel, and Apple Silicon with a few caveats that we’ll touch on later.

The actual installation varies, depending on your OS and hardware, so feel free to jump to the section relevant to your setup.

Note: To make this guide easier to consume we’ve broken it into four sections:

  1. Introduction and installation on Linux
  2. Getting running on Windows and MacOS
  3. Using the Stable Diffusion Web UI
  4. Integration and conclusion

Intel graphics support

At the time of writing, Automatic1111’s Stable Diffusion Web UI doesn’t natively support Intel graphics. There is, however, an OpenVINO fork that does on both Windows and Linux. Unfortunately, we were unable to test this method so your mileage may vary. You can find more information on the project here.

Installing Automatic1111 on Linux — AMD and Nvidia

To kick things off, we’ll start with getting the Automatic1111 Stable Diffusion Web UI – which we’re just going to call A1111 from here on out – up and running on an Ubuntu 24.04 system. These instructions should work for both AMD and Nvidia GPUs.

If you happen to be running a different flavor of Linux, we recommend checking out the A1111 GitHub repo for more info on distro-specific deployments.

Before we begin, we need to install a few dependencies, namely git and the software-properties-common package:

sudo apt install git software-properties-common -y

We’ll also need to grab Python 3.10. For better or worse, Ubuntu 24.04 doesn’t include this release in its repos, so, we’ll have to add the Deadsnakes PPA before we can pull the packages we need.

sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10-venv -y

Note: In our testing, we found AMD GPUs required a few extra packages to get working, plus a restart.

#AMD GPUS ONLY
sudo apt install libamd-comgr2 libhsa-runtime64-1 librccl1 librocalution0 librocblas0 librocfft0 librocm-smi64-1 librocsolver0 librocsparse0 rocm-device-libs-17 rocm-smi rocminfo hipcc libhiprand1 libhiprtc-builtins5 radeontop
# AMD GPUS ONLY
sudo usermod -aG render,video $USER
# AMD GPUS ONLY
sudo reboot

With our dependencies sorted out, we can now pull down the A1111 web UI using git.

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui && cd stable-diffusion-webui
python3.10 -m venv venv

Finally, we can launch the web UI by running the following.

./webui.sh

The script will begin downloading relevant packages for your specific system, as well as pulling down the Stable Diffusion 1.5 model file.

If the Stable Diffusion Web UI fails to load on AMD GPUs, you may need to modify the webui-user.sh. This appears to be related to device support in the version of ROCm that ships with A1111. As we understand it, this should be resolved when the app transitions to ROCm 6 or later.

#AMD GPUS OMLY
echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> ~/stable-diffusion-webui/webui-user.sh

If you’re still having trouble, checkout our “Useful Flags” section for additional tips.

In the next section, we’ll dig into how to get A1111 running in Windows and macOS.

Installing on Windows

If you’re rocking a Windows box, A1111 can be a little hit and miss depending on your hardware. On Windows, the dashboard only natively supports Nvidia hardware. The good news is there are a couple of projects dedicated to getting it running on AMD and Intel-based boxes.

If you’ve got a Nvidia card, you’re in luck as installing A1111 is as simple as downloading the sd.webui.zip file from the Automatic1111 GitHub page and extracting it.

Then, inside the folder, you just need to run the update.bat and run.bat executables, and you’ll be generating images in no time at all. Note, you may need to unblock the batch files under file properties before running them.

AMD

For those of you with AMD cards, Windows support is provided by lshqqytiger’s Direct-ML-compatible fork of SD WebUI project. The bad news is that the Direct-ML implementation is much, much slower than native ROCm on Linux.

AMD has previously recommended using the ONNX runtime with Microsoft’s Olive library. Unfortunately, we didn’t have much luck getting this method to work in our testing with a Radeon RX 7900 XT.

If you just want to play around with image generation on AMD/Windows and don’t mind waiting a while, here’s how you can get it up and running.

Start by downloading and installing the following dependencies:

Next use git to pull down the latest version of A1111 for AMD:

git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu.git && cd stable-diffusion-webui-amdgpu
git submodule update —init —recursive

From there, we want to launch A1111 using the —-use-directml flag.

.\webui.bat —-use-directml

The installer will go to work and after a few minutes you should be greeted by the Web GUI in your browser. Alternatively, you can try out Zluda — a project that aims to let you run unmodified CUDA code on AMD graphics chips — but we’ll note that while performance is much better once set up, we did run into some odd behavior.

If you’ve already launched A1111 using Direct-ML, you’ll need to remove the .\venv\ folder first to avoid conflicts during the install.

rm -r .\venv\

Next, you’ll need to download and install the latest HIP SDK — version 5.7.1 as of writing — from AMD’s website. It’s probably worth rebooting at this point for good measure.

Finally, launch A1111 with the —-use-zluda flag and go make a cup of coffee or two — it’s gonna be a while.

.\webui.bat —-use-zluda

Once the dashboard is running, don’t be surprised if it takes 15 minutes or more to generate your first image. All subsequent images should be quite snappy, but the first one just seems to take forever. We’re not the only ones that have run into this issue, so it appears that something is going on behind the scenes.

It’s also worth noting that the Zluda project has been on uncertain footing since ever since AMD stopped funding it earlier this year. We also ran into some other odd behavior and freezes when running in this mode. That’s not to say things won’t improve, just don’t be surprised if you run into some junk while testing it out.

And macOS

If you happen to be running on macOS, the installation is similarly straightforward. Before you begin, you’ll need to make sure the Homebrew package manager is installed on your system. You can find more information on setting up Homebrew here. Using brew, we’ll install a few dependencies and pull A1111 down from GitHub.

brew install cmake protobuf rust python@3.10 git wget
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

As part of the installation we also need to pull down the Stable Diffusion 1.5 model file and place it under the ~/stable-diffusion-webui/models/stable-diffusion/ folder. We can achieve this using the following wget command.

wget -O models/stable-diffusion/v1-5-pruned-emaonly.ckpt https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt

Finally, can navigate to the stable-diffusion-webui folder and launch the web server.

cd stable-diffusion-webui
./webui.sh

After about a minute or so you should be greeted with the A1111 dashboard.

In the next section, we’ll move onto generating images from text or image prompts and touch on using inpainting to target your edits.

Generating your first image

A1111 is an incredibly feature-rich platform for image generation with support for not only text-to-image, image-to-image, and even fine-tuning. But for the purposes of this tutorial we’ll be sticking to the basics, although we may explore some of the more advanced features in later coverage. Be sure to let us know in the comments if that’s something you’d like to see.

When you load the dashboard, you’ll be dropped on to the “text2img” tab. Here, you can generate images based on positive and negative text prompts, such as “a detailed painting of a dog in a field of flowers.”

Generating an image only takes a few seconds with the right prompt and a few tweaks

The more detailed and specific your prompt, the more likely it is that you’re going to find success generating a usable image.

Below this, you’ll find a number of tabs, sliders, and drop down menus, most of which change the way the model goes about generating an image.

The three settings to keep your eye on are Sampling Method, Sampling Steps, and CFG Scale

The three that we find have the biggest impact on the look and quality of the images are:

  1. Sampling Method — Different sampling methods will produce images with a different look and feel. These are worth playing around with if you’re having a hard time achieving a desired image.
  2. Sampling Steps — This defines the number of iterations the model should go through when generating an image. The more sampling steps, the higher the image quality will be, but the longer it’ll take to generate.
  3. CFG Scale — This slider determines how closely the model should adhere to your prompt. A lower CFG allows the model to be more creative, while a higher one will follow the prompt more closely.

Finally, it’s also worth paying attention to the image seed. By default, A1111 will use a random seed for every image you generate, and the output will vary with the seed. If the model generates an image that’s close to what you’re going for, and you just want to refine it, you can lock the seed by pressing the “recycle” button before making additional adjustments or experimenting with different sampling methods.

The “Width” and “Height” sliders are mostly self explanatory, but its worth noting that generating larger images does require additional memory. So, if you’re running into crashes when trying to generate larger images, you may not have adequate memory, or you may need to launch the Web UI with one of the low or medium-memory flags. See our “Useful Flags” section for more information.

The “Batch count” and “Batch size” govern how many images should be generated. If, for example, you set the batch count to two and the batch size to four, the model will generate eight images, four at a time.

It’s worth noting that the larger the batch size, the more memory you’re going to need. In our testing we needed just under 10GB of vRAM at a batch size of four. So if you don’t have a lot of vRAM to work with, you may want to stick to adjusting the batch count and generating images one at a time.

If you do have a little extra memory, you can take advantage of more advanced features such as “Hires. Fix,” which initially generates a low-resolution image and then upscales by a set factor. The Refiner is similar in concept, but switches to a second “refiner” model part of the way through generating the image.

Image-to-image generation

In addition to standard text-to-image generation, A1111 also supports a variety of image-to-image, inpainting, and sketch-to-image features, similar to those seen in Microsoft’s Cocreate.

Opening A1111’s “img2img” tab you can supply a source image along with positive and negative prompts to achieve a specific aesthetic. This image could be one of your own, or one you generated from a text prompt earlier.

You can also transform existing or previously generated images into new creations using A1111

For example, if you wanted to see what a car might look like in a cyberpunk future, you could do just that, by uploading a snap of the car and providing a prompt like “reimagine in a cyberpunk aesthetic.”

Most of the sliders in A1111’s image-to-image mode should look familiar, but one worth pointing out is “denoising strength” which tells the model how much of the original image it should use when generating a new one. The lower the denoising strength, the less visible the changes will be, while the higher you set it, the more liberty the model has to create something new.

Alongside straight image-to-image conversions, you can also use A1111’s inpainting and sketch functions to more selectively add or remove features. Going back to our previous example, we could use Inpaint to tell the model to focus on the car’s headlights, then prompt it to reimagine them in a different style.

Using inpainting, it’s possible to target the AI model’s generative capabilities on specific areas of the image, such as this car’s headlights

These can get rather involved quite quickly and there are even custom models available specifically designed for inpainting. So in that respect, we suppose Microsoft has made things a lot simpler with Cocreate.

In fact, there are a whole host of features built in to A1111, many of which, like LoRA training, are beyond the scope of this tutorial, so we recommend checking out the project’s wiki here for a full run down of its capabilities.

Adding models

By default the SD Web UI will pull down the Stable Diffusion 1.5 model file, but the app also supports running the newer 2.0 and XL releases, as well as a host of alternative community models.

At the time of writing there’s still no support for Stability’s third-gen model, but we expect that’ll change before too long. With that said, between its more restrictive license and habit of turning people into Lovecraftian horrors, you may not want to anyway.

Adding models is actually quite easy, and simply involves downloading the appropriate safetensor or checkpoint model file and placing it in the ~/stable-diffusion-webui/models/stable-diffusion folder.

If you’ve got a GPU with 10GB of more of vRAM, we recommend checking out Stable Diffusion XL Base 1.0 on Hugging Face as it generates much higher-quality images than previous models. Just make sure you set the resolution to 1024×1024 for the best results.

You can also find custom community models in repositories such as CivitAI that have been fine-tuned to match a certain style. As we alluded to earlier with Stable Diffusion 3, different models are subject to different licenses, which may restrict whether they can be used for research, personal, or commercial reasons. So you’ll want to review the terms prior to using generated models for public projects or business applications. As for community models, there’s also the issue of whether or not they’ve been created using material under copyright.

Useful launch flags

If you happen to be running on a remote system you may want to pass the --listen flag to expose the web UI to the rest of your network. When using this flag, you’ll need to manually navigate to http://:7860 to access the WebUI. Please be mindful of security.

./webui.sh --listen

If you run into any trouble launching the server, your graphics card may not have adequate memory to run using the default parameters. To get around this we can launch the model with either --medvram or --lowvram which should help avoid crashing on systems with 4GB of video memory.

Some cards also struggle when running at lower precisions and may benefit from running with the --precision full and/or --no-half flags enabled.

If you’re still having trouble, check out Automatic1111’s docs on GitHub for more recommendations.

Integrating Automatic1111 into Open WebUI

If you followed our recent tutorial on building a ChatGPT-style dashboard for retrieval augmentation, you may be interested to know that Automatic1111 can be integrated directly into Open WebUI, allowing you to generate images directly from the chat interface.

To get started, you’ll need to launch the Automatic1111 SD Web UI using the --api --listen tags.

./webui.sh --api --listen

Once connected in the Open WebUI dashboard, you can start generating images directly from your LLM chats

From there, log into the Open WebUI dashboard and open up the Admin Settings panel. Select “Images” from the sidebar and set “Automatic1111” as the generation engine. Finally set the Automatic Base URL to:

http://127.0.0.1:7860

Note: If you are running Automatic1111 on a different machine, you’ll want to change 127.0.0.1 to that machine’s IP address or hostname. Again, please be mindful of security and any firewalls in the way.

If everything is configured correctly, you should be able to select your preferred model, image size, and the number of sampling steps that should be used.

Turns out that AI chatbots are pretty good at generating AI image prompts, go figure

To test it out, start a new chat and ask your large language model of choice to generate an image prompt for you. We recommend starting your prompt with “generate a short two-sentence image prompt for…”

Once it responds, click the image button below which should send it to SD Web UI for processing. After a few moments an image should appear in the chat.

Closing thoughts

Like most AI technologies out there today, image generation is controversial. In the right hands it can be a powerful creative tool. Unfortunately in the wrong ones it also has the potential to displace artists, or worse, be used to create disinformation, propaganda, and other harmful or inappropriate content. We’ve documented some of this technology here not as an endorsement but to help dispel the hype and show our readers how this stuff works, and how others are using it, whether one personally approves of its use or not.

Efforts are being made to make these models safer and less biased, but we’re still very much in the early days of the technology and there are sure to be growing pains. Who can forget when Google had to pull its image-gen feature from Gemini after it started creating ethnically diverse images in entirely inappropriate historical contexts?

There’s also the issue of how these models are created in the first place. We focused heavily on Stable Diffusion in this piece because its creator made the models available to the public under an incredibly permissive license. However, it seems that in creating these models Stability AI may have violated the copyrights of artists whose work appeared on the web.

The model builder’s use of copyrighted materials is now facing down multiple copyright infringement cases brought by Getty and other artists, who claim their works were used without permission to train its signature model. What this could mean for Stability AI and the broader industry remains to be seen.

In any case, if you do choose to embrace generative AI art, we encourage you to do so in the most respectful and responsible manner possible.

The Register aims to bring you more AI content soon, so be sure to share your burning questions in the comments section. And, if you haven’t already, be sure to check our other local AI guides. ®


Editor’s Note: Nvidia provided The Register with an RTX 6000 Ada Generation graphics card to support this story and others like it. Nvidia had no input as to the contents of this article.

Continue Reading