BoltMonkey
/

BoltMonkey_PhotoReal

Text-to-Image

Stable Diffusion

photorealistic

sd_1.5

Model card Files Files and versions Community

BoltMonkey commited on Jul 24, 2024

Commit

11a7c25

verified ·

1 Parent(s): 75a3aaf

Update README.md

Browse files

Files changed (1) hide show

README.md +70 -3

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
----
-license: openrail
----

+---
+license: openrail
+pipeline_tag: text-to-image
+tags:
+- Stable Diffusion
+- photorealistic
+- sd_1.5
+---
+https://civitai.com/models/597300/boltmonkey-photoreal?modelVersionId=667353
+This is an extremely high-quality photorealistic SD1.5 model that I created as an offshoot to a business project of mine that I work on in my spare time. I believe in the open-source nature of AI and am gradually releasing some of my work that I do not intend to use for my ongoing project. I have been slowly developing this model for roughly a year.
+I have labelled this model as a merge but it is already 30+ iterations deep which include a substantial number of blockmerges and multiple fine-tunes along the way.
+    The model is very realistic, especially for SD1.5.
+    Hands are generally 5-fingered and not mangled, but overly complex or poor prompting style can result in amputations or distortions.
+    Most textures are well-rendered, but I have found that extremely dusty environments (such as in a mine tunnel) look a bit too generic for my liking.
+    Lighting and shadows are a strong point of the model. Particularly, volumetric lighting (such as light rays through misty or dusty atmosphere) is well-rendered.
+    Most of my showcase uses animals, but the model is adept at generating humans, architecture, natural environments, food, etc... Though, I find that I have not trained enough on most forms of transport.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/T8-3LROcMo6hSihTE4ddx.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/8mEs5A5HZp2el7i5nGhQA.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/2iKSk1Zx_muykrlepHiWL.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/PMBxD5xypOIMg4z1wHRZK.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/tb_zJ_Te6OvrMPNZuxOZC.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/j7v47skcSFbOCASEsGSOL.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/LAJwc4Bjj5zjCl3Nm545z.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/Ib2RGzy-YfxIiHEG-_9ew.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/L6h67yde-8cforZbRoPq7.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/w4QHbzjBedEL3SWCUeqti.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/pdjf16jNDgk00HCS7FEaz.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/SrIJbQTJ0ajNQPLuMJXWU.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/3dEkBxiJ6k9dJLC6jDD5f.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/_2xeTm1D1EiJMsB9T4_t9.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/A7b2RLXOLkluqSfxtIsKh.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/WXilW7ACN6ivAiAOFUza0.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/AaQlA3LufIfecXJoBwelr.png)
+Suggestions for use
+I have a lot to learn about prompting from the collective CivitAI userbase, but here are a couple of things that I have found work well:
+TL;DR:
+DDIM, 15-40 steps, CFG ~2-10, Clipskip 1-4 (depending on use), LoRAs work well.
+This model works well with square and rectangular aspects. Resolutions of 768x+ work best but will sometimes result in duplications around 1024x. Having said that, 512x+ will still produce good images.
+The quality of this model's output is very realistic even with minimal prompting, but is exceptional with well-structured prompts. Moreover, this model works very well with LoRA's so long as you are cognisant of to the LoRA's training resolution (768+ work best). I don't use anime LoRA's so I can't offer any suggestions there, but I will be interested in your results if you try it.
+Good quality photorealistic images will result from extremely simple prompts (e.g., "cat") but the model responds very well to quality guidance prompts and some more complex prompting too.
+The following prompts are my go to:
+"ultrarealistic photography, 32k UHD, absurdres, natural light and shadows, volumetric lighting, natural skin textures, accurate attention to details, depth of field, sharp focus"
+Typically, I would use DPM++_3m_SDE_GPU as my sampler with SGM_uniform noise schedule, but I find that this model works best (to my taste) with DDIM sampler and DDIM_uniform noise schedule.
+15 Steps is enough to get good images most of the time, but I typically use 25-40. I have run a few generations with ComfyUI's maximum of 999 steps just to see how it fares. Obviously the results look great, but I see no real need to take it past 50 at the max.
+CFG is a difficult one to give a value for. A CFG of 2-4 work well, but sometimes I will take it as far as 10 depending on what I am generating. I suggest starting with a value of 4 and gauging it for yourself. Obviously, lower values give the model more freedom.
+This model works well without clipskipping but if you are merging several disparate concepts into 1 image then it may pay to skip 2 or 3 to give some fluidity to the concepts