diff --git a/docs/help/diffusion.md b/docs/help/diffusion.md index 0dbb09f304..7182a51d67 100644 --- a/docs/help/diffusion.md +++ b/docs/help/diffusion.md @@ -20,7 +20,7 @@ When you generate an image using text-to-image, multiple steps occur in latent s 4. The VAE decodes the final latent image from latent space into image space. Image-to-image is a similar process, with only step 1 being different: -1. The input image is encoded from image space into latent space by the VAE. Noise is then added to the input latent image. Denoising Strength dictates how may noise steps are added, and the amount of noise added at each step. A Denoising Strength of 0 means there are 0 steps and no noise added, resulting in an unchanged image, while a Denoising Strength of 1 results in the image being completely replaced with noise and a full set of denoising steps are performance. The process is then the same as steps 2-4 in the text-to-image process. +1. The input image is encoded from image space into latent space by the VAE. Noise is then added to the input latent image. Denoising Strength dictates how many noise steps are added, and the amount of noise added at each step. A Denoising Strength of 0 means there are 0 steps and no noise added, resulting in an unchanged image, while a Denoising Strength of 1 results in the image being completely replaced with noise and a full set of denoising steps are performance. The process is then the same as steps 2-4 in the text-to-image process. Furthermore, a model provides the CLIP prompt tokenizer, the VAE, and a U-Net (where noise prediction occurs given a prompt and initial noise tensor).