Questions

#2
by LOL2024 - opened

Hello, I just have some questions about this model, could you tell me how do you train the model, fine-tune or train from scratch? how many pictures get used for train this model? why architecture is SD1.5 instead of SDXL or any other architectures(such as Flux.1 Schnell or PixArt-Sigma)? And will you add CC0 images from Wikimedia Commons to the training dataset in the future?

excellent question, all of them will be answered in a paper soon, you can look at the microcosmos datatse that was what was used, I'm still working on a final version, The model was trained from scratch, not fine-tuned, and we used the Microcosmos dataset for this (you can check it out here: Microcosmos on HuggingFace). Right now, it's not the final version, but it's a good start. We used about 15k images, which is a bit on the lower side (the ideal number is closer to 50k), so it’s going to be overfitted, but still usable. I want to reach the mark of half a million quality + capitions cc0 images,

I went with Stable Diffusion 1.5 because it’s powerful but also less computationally demanding than something like SDXL or Flux.1 Schnell I have jus a 3090 and two 3060. I actually think good V-prediction models (which could be our next iteration) are on par with Flux and SDXL, so there's room to experiment with that in the future. As for Wikimedia Commons, it’s a bit tricky because while a lot of it is marked CC0, sometimes it’s not actually free for use, so if you're aiming for 100% correct usage, you have to be super careful with those images.

one of the biggest challenges now is producing high quality captures I'm using/ GIT Generative Image-to-text Transformer and gemini, and human reviwning all with a small number of images each error is much more significant and it saddens me that many of the cool things I can't put it in the data because they all have copyrgih, I hope to see how legislators will see this sitution cause i consider add some ai generted content to the dataset tha would help the missing concepts,

Thanks for answer, this model is very great, I hope I could see the final version soon, however, I have another question: Will it feature a rating system similar to Pony Diffusion, using prompts such as 'score_9', 'score_7_up', 'source_cartoon', and 'rating_safe' to control the quality and the contents of generated images? Or similar to common SD1.5 and SDXL models, using prompt such as 'masterpiece", 'best quality', 'medium quality' and 'worst quality' to control the quality of generated images?

In some subsest like the one from the open game art I did some score rating, because some of the art where good other where boring, i might ad more nuances of that in newer classified images, most of the CC0 images are boring...

Well, could you tell me how do I contribute CC0 images or own CC0 works(if exist) to the training dataset?
Just checking does it(they) really licensed CC0, upload it(them) and make pull requests for them?

I think it would be like that. I usually group elements by what is on top and what the images contain, and then by the source of order comes the image site and as it is licensed on that site, I need to add good or interesting illustrations

Emm, Does CC0 3d resources on opengameart.org get already used in this model's training dataset, such as texture resources for 3d models and preview (or multiple views) of 3d models.

I havent add any 3d imagens yet, i think it might help the model to achive betthe volume or reduce some flatiness, there are some cc0 models on sketchfab i have download some and take some screenshots i , but be aware tha some of the are marked as CC0 but when reading the descriptions its is cc by or other thinkg (teoricaly it would be ok to ad in the datatset, but I give more preference to a full CC0), If u could collect some 3d screenshots i would gladly classify and and to microcosmos dataset

Preferably larger equal to 768x768

Sorry, I forget to ask you, what's the recommend caption ways for the contribute images? WD1.4, BLIP, JoyTag or any others?Does it will using fine-tune CLIP model like this* in the future? Does the training dataset allow to contribute images that rating questionable or rating explicit?Could I using prompt such as "from Met Museum Open Access", "upload on opengameart"or "by (artist)" tags to speciefic generate results' styles on the model?

*Here is the fine-tune codes of the CLIP fine-tune model that mentioned

I used GIT GEMINI API, which is free but has some censorship or refuses to some images, and Jointtagger and human caption all in diferente subsest, each one has been used in different subsets the performance and accuracy of the model depends of the subject

I find it hard to believe that explicit images are CC0, but if you still want to contribute I don't see a problem as long as you mention or highlight them, I think there are some strange things like that in the Met Museum,

Feel free to contribute whi any cc0 and caption as long as it is accurate, a newer model is cooking by the end of the week it will have something to test

I wished to not include artist tags so it would be like styles like should have been pony XL,

I wished to not include artist tags so it would be like styles like should have been pony XL,

https://github.com/6DammK9/nai-anime-pure-negative-prompt/blob/main/ch02/pony_sd.md
In fact, PonyXL trained artists' styles by pretty weird way

In the beginning, it was sai that pony xl was trained just using styles of aesthetics not artists tags, but this was partially untrue which caused a lot of confusion because it seemed like data laundering which is something I would like to avoid. Some laws allow the use of data even through web scraping if there is no individualization.

The idea of ​​the microcosmos dataset is a small dataset with a large variety, the idea of ​​the model is to show how it is possible to create a model that can satisfy a legislation like mine 🇧🇷 , and finally the article to detail all of this. Probably when I fish everything

If the laws change and all this is irrelevant, another topic I would like to talk about since we are almost talking here, is about "distillation" which is when training a model with the best results from another one, so you have a model with fewer parameters and you could say that your data is "clean"

If I were to develop software the right way in my country that does not have fair use or any other legal permissiveness, it fits the definition of piracy. Most Latin American countries adopt civil law instead of common law.

The result is that people from other countries can train models with more infinity more data than any developer could legally obtain.

one of the biggest challenges now is producing high quality captures I'm using/ GIT Generative Image-to-text Transformer and gemini, and human reviwning all with a small number of images each error is much more significant and it saddens me that many of the cool things I can't put it in the data because they all have copyrgih, I hope to see how legislators will see this sitution cause i consider add some ai generted content to the dataset tha would help the missing concepts,

Sorry, I want to know how to prevent ai contents when generate images by using this model, just add "ai content, generated, diffusion" to negative prompt?

its not possible to diferentiate in some cases, kisCC0 is one of the fonts of the dataset, it cointain some images i belive are ai generated, is hard do diferenciate just gerneric images that normlay exist in free sites from the one that are generic but ai , i suggest use in the negative use "generic" some images i find not so shure i label this way, also some images of pexels are ai generated but is hard to afirmate is not ai or not, as well as stock sites flooded wiht this kind of images so that a problem for the future, AI images are not icluded, or not intetionaly include, but might me in a future interation, but for this its safe to say that most of traing data is old scholl CC0/CC no ai content, a vible way to eliminate that aspect woud be, train a negative lora/textual inversion of just theses types of generic/ai images, to exclude this aspecs

next model isteration will come out in 180 hours of traing in 16452 images

Although it's model that train on CC0 dataset, but its latest version actually could generate Pikachu lol.
grid-0000.png

I noticed this as well. I believe it might be a remnant of some latent representation ont CLIP model itself, as this phenomenon is quite common.Even though it has not been trained with images of a picachu, the clip knows how to represent it. Many CC-trained models like CommonCanvas and others , sometimes generate images that resemble well-known characters like Pikachu, The only way to resolve this would be to completely retrain the clip which woud be a nighmare

but the samen don happend in lucario or other pokemons, that why i am shure is a clip thing
grid-0000.png
lucario, pokemon , digital_media_(illustration), best quality, 8k,
Negative prompt: worst quality, low quality, blur, EasyNegative, lowres, bad anatomy, bad hands, text, missing fingers, extra digit, fewer digits, blur, low res

Pixel-Dust changed discussion status to closed
Pixel-Dust changed discussion status to open

I pressed close instead of commenting lol, another aspect is that even though "You can't register a style only individual works" it doesn't apply to characters. This creates a complex situation with fan art and parodies and imitations. Their style itself can be used but the characters they contain can't, one of the reasons why it doesn't include some arts even that were marked as CC0 or that were given to me by some artists because the characters were from some story or something like that.

I noticed this as well. I believe it might be a remnant of some latent representation ont CLIP model itself, as this phenomenon is quite common.Even though it has not been trained with images of a picachu, the clip knows how to represent it. Many CC-trained models like CommonCanvas and others , sometimes generate images that resemble well-known characters like Pikachu, The only way to resolve this would be to completely retrain the clip which woud be a nighmare

https://xcancel.com/elanmitsua/status/1865939793793933510
Well, I found there are a CLIP model with clear license published recently, but idk does it could using with this model.

Thanks, I'v used the clip from original 1.5, retrain on other clip is a job for future me, good news I got a another 3090 just need to manage multi GPU on Linux. Having some driver issues

Hey there, love what you're doing! I've been preparing some contributions (mostly 3D poses so far) for the Elan Mitsua project, which is similar in concept to yours (though the final model isn't CC0), and I'd love to submit them for your use as well. Is there anywhere else for people to chat about this (a Discord server, perhaps), or just here?

If is CC0 means it can be incorporates with other licences, most of the training content is CC0 so there iso no problem using elsway,u can add to https://huggingface.co/datasets/Pixel-Dust/Microcosmos
, I'v been wanted to make a model for genaraing free game assets, but good images are not on CC0 or similar, any contribution is fine, I don't have much plans for discord or something like that, but um can find me on furry diffusion discord. But I think is a good idea do create a curation CC0 or permissive content for training, as discussed before not all images that are in public access is copyright free.

I am working now the next intertion of the model. So cool designs are coming to Public domain this year so I hope curating 1-2M images

Now I go two 3090 and more ran so I hope to do more stuff

If is CC0 means it can be incorporates with other licences, most of the training content is CC0 so there iso no problem using elsway,u can add to https://huggingface.co/datasets/Pixel-Dust/Microcosmos
, I'v been wanted to make a model for genaraing free game assets, but good images are not on CC0 or similar, any contribution is fine, I don't have much plans for discord or something like that, but um can find me on furry diffusion discord. But I think is a good idea do create a curation CC0 or permissive content for training, as discussed before not all images that are in public access is copyright free.

I am working now the next intertion of the model. So cool designs are coming to Public domain this year so I hope curating 1-2M images

Although I saw more and more people change to SDXL, SD3.5 and Flux from SD1.5, but I still hope to see next version of this model.

Hey there, love what you're doing! I've been preparing some contributions (mostly 3D poses so far) for the Elan Mitsua project, which is similar in concept to yours (though the final model isn't CC0), and I'd love to submit them for your use as well. Is there anywhere else for people to chat about this (a Discord server, perhaps), or just here?

To be honest, Elan Mitsua is good, but its restrictions are too many such as ban LoRA Training and ban AI training on generated images from Mitsua Likes, although it's understandable, but that's why it can't spreare out; And Mitsua Likes can't generate furry, so I won't use it after I tested it.

This comment has been hidden

For the next iteration a plan to solve some of that wit adding AI images, I particularly don't believe ai content doesn't have copyright, but seen is that what most people believe that,
If used ai generated images would be in separate dataset from microcosmos, which will remain traditional cc0/similar content.

Mitsua model is infinity better than mine, I hope to feel that gap, but I understand why he took this decision, the thing is it's the internet so... The decision I made to keep it on 1.5 is because the amount of Loras, embedding, vae... and all the things that are still relevant on the ecosystem, as "Democratization of High image generation" supped to be as said before I hope for some legal clarification in the future of any of AI licences and everything else

I think 1.5 is a good enough for this, I see all the good things made in XL and I can see good trained 1.5 do the same, not mention is more accessible

For the next iteration a plan to solve some of that wit adding AI images, I particularly don't believe ai content doesn't have copyright, but seen is that what most people believe that,
If used ai generated images would be in separate dataset from microcosmos, which will remain traditional cc0/similar content.

Mitsua model is infinity better than mine, I hope to feel that gap, but I understand why he took this decision, the thing is it's the internet so... The decision I made to keep it on 1.5 is because the amount of Loras, embedding, vae... and all the things that are still relevant on the ecosystem, as "Democratization of High image generation" supped to be as said before I hope for some legal clarification in the future of any of AI licences and everything else

I think 1.5 is a good enough for this, I see all the good things made in XL and I can see good trained 1.5 do the same, not mention is more accessible

If you wanna add some public domain/cc0 images from wikimedia commons, here have a dataset of it, but only images are public domain/cc0, captions are cc-by-sa-4.0:
https://huggingface.co/datasets/Mitsua/safe-commons-pd-3m
If you want add AI-generated images, here are also have some datasets of it but maybe need strict filter to ensure it doesn't decrease the model quality:
https://huggingface.co/datasets/pls2000/aiart_channel_nai3_geachu
https://huggingface.co/datasets/deepghs/aibooru_full
https://huggingface.co/datasets/deepghs/e6ai_full

I ratter put my on gens, than curating chunks of ai content, the problem is that are much similar images on e6ai, they are good images, but want it more diverse, maybe it's time for a AI content curation, I hope someone else can help me on an audition when microcosmos get bigger, I am collecting other normal images and using gemini to extract the captions to generate a systemic dataset based on that captions, not shure if is gonna work but I will try

Sign up or log in to comment