Image descriptions can of course be done yourself manually, but I've trained GPT for this task, here's a link to it:
https://chatgpt.com/g/g-wvc9iwYuc-image-captioner
just upload images without asking anything, it knows what to do. the output will be a neat table.
If you want to know what criteria are used to make descriptions, there is a button “Show the list of parameters...” on the start page. The only point is that ChatGPT does not work well with large amounts of uploaded materials, so it is better to upload images in portions, 5 at a time for example.
https://chatgpt.com/g/g-wvc9iwYuc-image-captioner
just upload images without asking anything, it knows what to do. the output will be a neat table.
If you want to know what criteria are used to make descriptions, there is a button “Show the list of parameters...” on the start page. The only point is that ChatGPT does not work well with large amounts of uploaded materials, so it is better to upload images in portions, 5 at a time for example.
👍5❤1
tomorrow we will analyze the parameters in Google Colab in detail. if someone has time to build his dataset, you can test it right away. an important point is that we will need a paid Colab, because we need a graphics card A100. in the free subscription at least it is not always possible to connect to it, at most it will not be available at all. for one session of training I spent about 30 compute units, which is about 3 eu.
👍3
Google Colab, what we need:
- to start somewhere around 100 compute units for a few tests (you can buy them here)
- google drive for 100 GB (link).
Next, let's open Colab, to which I already gave you a link. and go to setting up the parameters, located in the second section Step 2. Setup Config
#comfyui #lora #flux
- to start somewhere around 100 compute units for a few tests (you can buy them here)
- google drive for 100 GB (link).
Next, let's open Colab, to which I already gave you a link. and go to setting up the parameters, located in the second section Step 2. Setup Config
#comfyui #lora #flux
👍1
The first three parameters are simple:
- choose a name for your model (literally any name, but I prefer to use a prefix with a reference to the base model, in this case FLUX_)
- specify the path to the folder in Google Drive where your dataset is located
- specify the path to the folder in Google Drive where LORA models and images from test renders will be saved.
- choose a name for your model (literally any name, but I prefer to use a prefix with a reference to the base model, in this case FLUX_)
- specify the path to the folder in Google Drive where your dataset is located
- specify the path to the folder in Google Drive where LORA models and images from test renders will be saved.
👍2
next we have sample prompts. these prompts are used to generate images after a certain number of training steps (in our case, the default is that every 250 steps test renders will be run. Of course, it is better to write prompts that suit your purposes. I usually use one of them exactly as for one of the images in the dataset, i.e. copy the description, and the other two are random, but also on the topic of architecture.
The three parameters that most affect the learning process:
- network_rank is responsible for how many different details in the dataset images LORA will be able to memorize and then use. the higher the value of the parameter, the more attention to details, but also the weight of LORA in the end will be greater. thus, if the overall style of the image is important to you, rather than individual elements, then set something around 16 or 32. if details are important, then 64 or even 128.
- learning_rate is responsible for how quickly the model will be trained. the range of values here is from 0.0001 to 0.0015. the higher the value, the faster you will reach the state of the overtrained model. usually, the smaller your dataset is, the higher the learning_rate should be set, otherwise there is a risk that you will waste a lot of time and money. the risk is minor, because we have test renders and we can draw conclusions from them and stop the process. well, and through steps_number you can set the limit when the process will stop itself
- steps_number - the number of training steps. most often the optimal parameter value here for a dataset of 50 images is between 2000 and 4000 steps.
- network_rank is responsible for how many different details in the dataset images LORA will be able to memorize and then use. the higher the value of the parameter, the more attention to details, but also the weight of LORA in the end will be greater. thus, if the overall style of the image is important to you, rather than individual elements, then set something around 16 or 32. if details are important, then 64 or even 128.
- learning_rate is responsible for how quickly the model will be trained. the range of values here is from 0.0001 to 0.0015. the higher the value, the faster you will reach the state of the overtrained model. usually, the smaller your dataset is, the higher the learning_rate should be set, otherwise there is a risk that you will waste a lot of time and money. the risk is minor, because we have test renders and we can draw conclusions from them and stop the process. well, and through steps_number you can set the limit when the process will stop itself
- steps_number - the number of training steps. most often the optimal parameter value here for a dataset of 50 images is between 2000 and 4000 steps.
conten_or_style - a simple switch, it is obvious from the name, you can choose whether you prefer image style, content or both.
metainfo_name and metainfo_version - information that will be stitched into the model, these parameters do not affect the learning process.
image_width and image_height - resolution of images that will be used for test renders. it is reasonable to leave 1024x1024 here.
metainfo_name and metainfo_version - information that will be stitched into the model, these parameters do not affect the learning process.
image_width and image_height - resolution of images that will be used for test renders. it is reasonable to leave 1024x1024 here.
image_2024-11-05_17-40-04.png
296.8 KB
If these parameters are configured, you can already start the learning process. select Runtime Type as shown in the screenshot.
then go to the menu Runtime > Restart session and run all. at the first step Colab will ask you to connect to Google Drive, and then everything will go automatically, you only need to monitor the folder with results to watch test renders, and if something goes wrong, stop the process. to stop the process, choose the command Runtime > Disconnect and delete runtime.
there are some other nuances, we will look at them next time, but, in general, this information is enough to start working on training your own models.
👍1
an interesting feature of model training for flux is that captions in the dataset are not necessary at all, just a folder with images is enough. the reason for this is the t5 encoder, which is somewhat llm (large language model), and can make captions itself. you may have noticed that t5 is also used in comfyui in flux algorithms along with the usual clip (familiar from sdxl and earlier versions of sd) , there it is needed to interpret your prompt more accurately, and in training to interpret captions or even make them.
#comfyui #lora #flux
#comfyui #lora #flux
❤3
let's compare: these are renders with exactly the same prompts, seeds and all parameters, only lora is different. lora models were also trained on the same seeds and the same parameters, but for the first image lora is used, where in the dataset were captions, and for the second were only images (the images themselves are exactly the same as in the first dataset).
and here is another example. remember that the dataset was photos of Coop Himmelb(l)au projects. a conclusion can be that if a dataset with captions was used, elements of real buildings from the company's portfolio are literally borrowed. this can of course be leveled out by reducing the weight of lora when rendering or mixing it with other lora models. if there are no captions, then the general architectural style remains, but without precisely following the details. both can be needed in different situations.
This media is not supported in your browser
VIEW IN TELEGRAM
Can Chat GPT be used to change information in a Revit model? Check out this video that Anatoly Razinkov prepared for our upcoming workshop on integrating AI into BIM.
This workshop will be in Russian, but if you are interested in the English version, leave a comment
This workshop will be in Russian, but if you are interested in the English version, leave a comment
👍4❤1
Let's talk about LORA training based on the Schnell model. in general, it's kind of impossible to do this because the model itself is not suitable for this task, but the developers from Ostris have prepared a special adapter that solves the problem. in the Google Colab code you need to additionally write a link to the model and the adapter, as well as replace the parameters for the test rendering. i've already done it, here's the link.
#comfyui #lora #flux
#comfyui #lora #flux
👍3❤1