Overview of Commonly Used Models in Stable Diffusion Webui (Part 1)
Here is a brief introduction to the functionality and usage of commonly used models in Stable Diffusion Webui, including stable-diffusion-v1-5, mo-di-diffusion, Cyberpunk-Anime-Diffusion, Arcane-Diffusion, Openjourney v4, SamDoesArt-V3, Anything V5/V3, anything-v4.0.
stable diffusion v1.5 model
The runwayml/stable-diffusion-v1-5 model performs well in natural image generation and image editing, but still needs improvement in generating rich details and long images from long text. The model provides a good foundation for the development of future image generation models.
- The model cannot achieve perfect realism. It cannot generate clear text and may not be able to generate more complex images, such as “a red cube on top of a blue sphere”.
- Faces and people may not be generated correctly.
- The model was primarily trained on English language data and may not perform well on other languages.
- The auto-encoding part of the model has some loss.
- The model was trained on a large-scale dataset, which includes adult content and is not suitable for use in products without additional safety measures and considerations.
- The dataset has some degree of memorization, and repeating images in the training data may cause some level of memorization. The LAION website can be searched for the training data to help detect images that may cause memorization.
- While image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on datasets such as LAION-2B, primarily described in English. Text and images from communities and cultures using other languages may not be fully considered, affecting the overall output of the model, which often defaults to white and Western culture. Additionally, the model’s ability to generate content from non-English prompts is significantly lower than from English prompts.
- The expected use of this model is in conjunction with NSFW detectors in Diffusers. These detectors work by comparing the model’s output with known hard-coded NSFW concepts.
mo di diffusion model
mo-di-diffusion is a fine-tuned Stable Diffusion 1.5 model using screenshots from a specific animation studio to achieve a “modern Disney style” effect. The model can be used like any other Stable Diffusion model, using the diffusers based on the Dream House training method and previous loss training. It is open-sourced under the CreativeML OpenRAIL-M license, which specifies usage rights and restrictions. You may use this model for free, but it cannot be used for intentionally generating or sharing illegal or harmful content. The author has no rights to the outputs generated by you, and you are free to use them and take responsibility for their use. You may redistribute the weights and use the model commercially/as a service. If so, please note that you must include the same usage restrictions as in the license and share copies of the CreativeML OpenRAIL-M with all users.
Arcane Diffusion model
Arcane-Diffusion is a project that uses Transformers and Diffusion models for text generation. It uses GPT-2 as the text generator and the unconditional Diffusion model to control the generation process. The project first trains a GPT-2 model and then uses Knowledge Distillation techniques to convert it into a smaller and faster Student model.
When generating text, Arcane Diffusion first extracts vocabulary from the output of GPT-2, which is then fed into the Diffusion model to generate new vocabulary. The Diffusion model gradually generates more coherent text from abstract noise, while GPT-2 ensures that the generated text conforms to language structures. The effect of this hybrid model is generating text that is more coherent and fluent, while avoiding repetition and meaninglessness that may occur when GPT-2 generates text alone.
Arcane Diffusion has been tested on multiple datasets, including poetry, prose, and scripts. Experimental results show that text generated by the hybrid model has a higher quality than text generated by GPT-2 alone, especially in generating longer text. The project’s code and pre-trained models are open source, and researchers hope to continuously improve the quality and diversity of generated text through further improvements. Overall, Arcane Diffusion demonstrates how to use Transformers and Diffusion models in combination for high-quality conditional text generation.
DGSpitzer/Cyberpunk-Anime-Diffusion，This is an AI model for generating cyberpunk anime characters, based on the fine-tuned Waifu Diffusion V1.3 model and Stable Diffusion V1.5 new VAE, trained in Dreambooth. After loading the model, using the keywords “dgs” and “illustration style” in the prompt can yield better results. For male characters of robotic nature, you can add “muscular male” to improve the output quality.
Openjourney v4 is trained on Stable Diffusion v1.5 using 124,000 images, for a total of 12,400 training steps and 4 epochs, with a training time of 32 hours. The author mentioned that the prompt “mdjrny-v4 style” is no longer needed when using Openjourney v4. In general, Openjourney v4 can generate images in various styles. It has been further trained and optimized on top of Stable Diffusion v1.5, resulting in more realistic and lifelike image generation.
Sandro-Halpo/SamDoesArt-V3 can be triggered using the token “SamDoesArt” and can be used anywhere in the prompt. It is recommended to place it at the beginning of the prompt, as this may produce slightly different results compared to placing it at the end. It is recommended to do some testing to find the optimal location for the token that suits individual preferences.
It is not recommended to immediately follow the keyword “SamDoesArt” with the word “style”, as this may result in unpredictable and strange outcomes that may not be what is intended. The effect of adding a comma after “SamDoesArt” or not is difficult to determine.
Examples include: “SamDoesArt, portrait of a pretty girl”, “SamDoesArt, a man working in a factory, manly, machines”, “SamDoesArt, an African lion, mane, majestic”.
For more information, please visit the visual guide.
Anything V5/V3 model
Anything V5/V3，“Anything V5/V3” is a model that is a hodgepodge of various components. The author himself describes it as a junk model with no fidelity to input tags, so it often adds irrelevant and messy details when used. After discovering this issue, the author refrained from fusing new models for a long time, as such fusion models with this nature are simply a waste of time. To learn more, please click on the link provided.
andite/anything-v4.0 is a model designed for generating anime-style images, which can produce high-quality and highly-detailed anime images based on short prompts. The model supports inputting danbooru tags to generate images. For example, inputting “1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden” can generate a corresponding anime-style image.
The model has a web interface that can be used directly, and can also be exported to other platforms such as Hugging Face and Google Colab. Users can use this model like other Stable Diffusion models. The model is licensed under the CreativeML OpenRAIL-M license, which means that the model is open for use and can be commercialized.
Here are some examples of generated images from the model: Inputting “masterpiece, best quality, 1girl, white hair, medium hair, cat ears, closed eyes, looking at viewer, :3, cute, scarf, jacket, outdoors, streets” can generate an anime girl image. Inputting “1boy, bishounen, casual, indoors, sitting, coffee shop, bokeh” can generate an anime boy image. Inputting “scenery, village, outdoors, sky, clouds” can generate a landscape image.
The model was developed by Rico, InterestingHuman, and Fannovel16, who received a lot of help and support during the development process. The model is open for use but has certain commercial restrictions, and users must comply with the CreativeML OpenRAIL-M license.