Use GenAI to Create Accurate Images of Specific Objects or Characters
AI Lab
4
min read
September 9, 2024
Generative AI is now mainstream. Many of us use tools like DALL.E & Midjourney to create images for work. While building an application on top of these models I faced a limitation:
How do I ask these models to generate an image of a specific entity in a certain setting? For example: How do I generate a picture of my puppy sitting on a stool eating pasta? Not any puppy, MY puppy!
After some research I came across a great solution that uses a combination of Stable Diffusion and Dreambooth. Stable Diffusion is an open-source machine learning model used for generating images from text prompts.
Dreambooth is a technique that can fine-tune models like Stable Diffusion and teach it to associate a certain word with an entity. This fine-tuning is possible because Stable Diffusion is open source giving us access to its architecture and weights.
Enough theory, let’s practice. I will now use Dreambooth to train a Stable Diffusion model to associate pictures of my puppy with the word hiccupmypup. Fire up your Google Colab notebooks!
Install dependencies
We use the Hugging Face diffusers library to fine-tune the model as well as generate photos.
Create
training_data
folder to store images used to fine-tune the model. Additionally create a folder calledtrained_model
to write the fine-tuned model.Upload pictures of your subject in the
training_data
folder. In my case, it’s going to be pictures of my puppy HiccupWe will be using the stabilityai/stable-diffusion-3-medium-diffusers model as the base model in this exercise. This model is hosted on Hugging Face, and hence we will need to do the following to be able to use this model:
Create an account on Hugging Face
Request access to this gated model (link)
Create a user access token and store a copy of it on your local system notepad (link)
Once the above steps are done, run the below code. On running this, you will be asked for a token. Please paste the generated user access token from your notepad.
Time to begin fine-tuning. Let’s take a look at some of the parameters we have used:
instance_prompt: This parameter describes the training data to the model.
max_train_steps & learning_rate: These are paramters that need to be tuned by you to get the best results based on your input
Now that the training is done, a new model will have been generated in the
trained_model
folder. Let’s use this model to generate fun images!Add a prompt as a parameter to
pipeline
to get desired outputsTry playing with the
num_inference_steps
andguidance_scale
parameters
Here are some images of Hiccup I generated:
Hopefully, in the future we will not need to fine-tune models in order to generate pictures of specific entities. NVIDIA recently released a paper that promises just that: link
Please reach out to us at info@betacrew.io for help building cutting edge GenAI applications