So i have automattic running decently. Can anyone reccomend good tutorials on diff models, setups and workflows.? There is so much spammy shit out there its making me just get frustrated. Not a total noob to this but would like to build some good fundamental practices. Not for work, just want to make cool shit.
Well, I’m no expert, but I’ve been playing with SD for some months, and I guess I can tell you what I wish I’d known.
I don’t know how much there is by way of generally-recognized best practices, partly because things are new, and partly because things are changing very rapidly from model to model. I mean, go back six months and Stable Diffusion couldn’t render letters coherently. Now you get words with frequent misspellings. There are new extensions coming out. Even the base models have changed significantly in what kind of effect one will get for a given prompt.
Can you maybe give an idea of what it is that you’re looking to get a start on? The syntax used for prompts? Prompt phrasing? What extensions to use? What the various parameters do? Explaining what things like a LoRA is? Ways to deal with common problems (e.g. too many fingers, or getting “mutant monsters” where one person is combined with another, or showing a closeup rather than a full-body shot of a person)?
There wasn’t any all-in-one tutorial that I ran into. I picked up information various places; looked at Reddit, looked at some websites. So I can’t direct you to something there.
For prompt ideas, I had good look looking at civitai.com, as they include the prompt text with their images, as well as the model used. I think that it’s a nice starting point, shows you what other people are doing and what effect it has.
For models, again, I tended to just look for what was popular on civitai.com. New ones coming out all the time. And it depends tremendously on what you want to do. The base SDXL model is fine for many things.
As to extensions, of the things that I’ve investigated and have found useful and would recommend looking to others:
Clip Interrogator. This will “reverse” an image into something like a prompt, which is useful for helping one understand what prompts might generate a given effect or image.
Ultimate SD Upscale (ultimate-upscale-for-automatic1111). If you want, say, a 2560x1440 image, you probably aren’t going to have much luck generating that directly; it’d require a great deal of VRAM. I’d recommend generating images at the resolution that your model was trained at (for SD 1.5, 512x512, for SDXL, 1024x1024). Then upscale that image. This extension can do the upscale in “chunks” and merge the output, so that you can generate very large images – given enough time – without requiring much VRAM.
Workflows? Well, I personally start with a short prompt and expand slowly. If one gets mangled-looking images, it can be useful to increase the number of “steps”. Put the terms that define the overall scene you want at the beginning of the prompt (e.g. “woman riding a motorcycle”), and less-important terms (“red helmet”) later in the prompt. Proximity of terms in the prompt does matter. I set my seed to 0, so that I can have a reproducible image for a given prompt and set of settings.
When upscaling with Ultimate SD Upscale in the img2img tab, use only a small amount of noise, else you’ll wind up with a different image from what you started with. I use 0.16.
To get control over a specific part of an image (e.g. you want someone to be holding a carrot, but not to have carrots on the table in front of them), you may want to look into inpainting; you can remove part of an image and regenerate it with a different prompt from what the overall image uses. Not the first thing I’d start looking at, but I think that it’d be hard to do serious work aimed at achieving a specific goal without using inpainting.
There’s also an SD extension, “Regional Prompter”, which lets one split an image up into multiple areas and use different prompts for those; another way to get control over a specific part of an image. I’ve had mixed results, but with the current Automatic1111 UI, it’s preferable to inpainting if you might want to go back and play with prompts used to generate the overall image after tweaking portions (ComfyUI, different frontend from Automatic1111, is also a way to go about doing that; ultimately, I think that if people are intending to compose images with a specific end goal in mind, get a lot of control over them, they probably want to eventually look into ComfyUI. Fewer extensions, though).
Outpainting is another way to get larger images, to generate beyond the edges of an existing image; worth looking at at some point.
For prompt syntax, the only thing I use is strength (carrot:1.2) gives the prompt term “carrot” 1.2 times the strength that just putting “carrot” into the prompt would. Commas don’t matter, other than to help make the prompt more-readable to a human.
I found that Stable Diffusion does a really good job of imitating the style of a given artist. Some artists work better than others. Here’s maybe a starting point. It can also be useful to combine styles.
While I’m over here, would rather be on the Fediverse, I’d still at least search /r/StableDiffusion on Reddit, because there’s a fair bit of useful information there. Maybe not a tutorial, but useful tidbits and questions answered.
Incredibly helpful thanks. It can seem a bit daunting just to start getting good images.
My son runs a couple weekly d&d campaigns so thats the basis for my initial efforts. I am going to generate charachters and some imagery to set the mood for his scenarios. I figured a firm project goal will help. He has very detailed descriptions of environments etc so that should behelpful with prompts.
My big questions are regarding best models a nd plugins to get going. Browsing civit looks like that will be very helpdul. Thanks again
The base model is probably fine for most things. I’d use that unless I were running into issues with it.
You can generate useful output without anything.
The most-important thing to my immediate use was the upscaler I mentioned above. Without that, one is going to be limited in output resolution; I spent some time trying to figure out how to get higher resolution output at first. If one doesn’t need higher-resolution output, that may not matter.