I have been experimenting with the “Casual Photo” generator, with mixed results. I find that if I am very careful, I can avoid extra limbs and weird fingers etc., but once I get too specific with my descriptions, all I get back is cartoons, whereas I really want realistic photographs. For example:
“An editorial photograph of 58 year old tall slim English woman standing in the reception of a high end hotel.”
This returns lots of low quality results, but a couple that are actually photographic (in the sense that but for minor details in rendering that don’t stand out, it would look like a photograph - the texture of the skin, lighting, colour, etc.). This for example:
:
Once I add a bit more detail for example, all I get is low quality, cartoon-like results with android-like plastic looking unnatural skin and hair, a cross between and Android and a 20 year old woman’s Instagram posts (despite specifically asking for a late middle-aged woman):
“Beautiful 58 year old white English woman with dark silver hair, large soulful brown eyes who is fully dressed but has slightly saggy boobs and broad hips, other than that, she is of an average build, and has a pleasantly plump face. She is dressed in a quirky way that shows she is artistic and intellectual. She looks kind but also strong. She is standing, smiling slightly, outside a stucco-fronted terraced house in Chelsea, London.”

This is exactly the TYPE of woman I want to convey, but this does NOT look anything like a casual photo; it is very easy to tell straight away from the texture of her skin and the overly perfect colouring that she isn’t a real person. I would say this is more like a cartoon, even if it isn’t full on stomach-turning anime. Note that I don’t particularly care about the weird hands, but the lack of realism, which has been dramatically worse since the terrible “upgrade” a few months ago. That was the best of a bed bunch. Most came out looking like this:

Which is about as realistic as a blow-up doll, as well as the fact that they have made her look about 20, and an Instagram attention-whore at that (goes to the bathroom to vomit).
This also seems to happen as soon as I add other people to the picture, or have more than one instance of Perchance open at once, perhaps when I am using too much server power? Could someone please explain. I am happy to generate things more slowly if that means more high photographic quality.
So the tl;dr version is this:
-
When is Perchance going to return to the (relatively) high quality images of several months ago? When there was the update done to the silly story-creator thing (for the terminally unimaginative who are incapable of coming up with their own stories), it was said that an upgrade to the photo generator would be done immediately after, instead we have a permanent downgrade.
-
Within the limits of what we have now, how can I get higher quality pictures and photos?
-
Is there a casual photo option that actually generates a casual photo, as in one that could be taken on a phone, rather than a cartoon image or a heavily filtered teenage Instagram image?
-
Idk
-
yea it’s possible, you could use negative prompting for that.
-
if you’re asking for an existing style prompt then no…
Hi Arch thanks for your response.
I actually find that negative prompting makes it worse, in keeping with the above principle that the more detail specified the less accurate the output is in terms of common sense, anatomy and so on. If I specify nothing negative, the results are mostly at least anatomically accurate, but if I specify that I don’t want multiple heads or tattoos, it is likely I will get these. For example the following prompt gets me some pretty good results, like the below (which is a painting still, but somewhat photographic, and I actually managed to get some real photographs before):
“A candid photograph of a pretty, middle class, 56 year old French woman who is slim with broad hips and graying hair that still has some of the original brown. She has prominent crow’s feet around her eyes and her skin accurately reflects her age all over. She is dressed with casual, artsy elegance, in a blouse, silk scarf, leather jacket, baggy cords, and with brown boots. She is standing at the bar of a cafe in Paris. Her hair is like a kaleidoscope of different colours, but has grayed considerably.”

Once I include the following negative prompt: “tattoos. writing. multiple pictures. bad anatomy. cartoons. anime. unrealistic skin. instagram-style filters. drawn images. a painting. purely gray hair. low quality hands. nudity.” - there is no real improvement, and if anything the quality is slightly lower, below representative, though at least they were all vaguely anatomically correct:

Once I add the instruction to include another person, all accuracy goes out the window:
“A candid photograph of a pretty, middle class, 56 year old French woman who is slim with broad hips and graying hair that still has some of the original brown. She has prominent crow’s feet around her eyes and her skin accurately reflects her age all over. She is dressed with casual, artsy elegance, in a blouse, silk scarf, leather jacket, baggy cords, and with brown boots. Her hair is like a kaleidoscope of different colors, but has grayed considerably. She is sitting discussing philosophy in a cafe in Paris with her 30 year old male lover. The lover is blond, clean-shaven and is wearing a tweed jacket, blue jeans and subtle sneakers.” (Same negative prompt as above)
Produces at best this:

But mostly garbage like this:

Is there a guide that discusses how best to use negative prompts? Or how to prompt at all. I am frustrated because I spent ages yesterday evening getting amazing, photographic results, and tonight it has all deteriorated. This leads me to suspect the servers are deliberately giving me bad results due to high traffic or some other reason that would depend on the time of day.
The main issue is that you are dealing with an LLM at the end of the day, so what works for example in a Craiyon would not work here 1:1. Keep in mind that what happens under the hood, is that the model takes your input and tries to relate it to what is tagged in those terms to its training data. Probably, the prevalence of “Instagram plastic dolls” and similar, is due to the input having some detailed anatomical descriptors.
That being said, the best way to debug this, is just by checking what works for others in other generators, for example, here is a quick run in AI Photo Generator with an apparent very minimal prompt:

Probably this is far from the quality that you want, but it gives you a hint on “how” those are being made if you click on the top left corner of any. There you may see something like this:

Just to copy the prompt:
Old lady drinking coffee in a Parisian bistro, cinematic shot, dynamic lighting, 75mm, Technicolor, Panavision, cinemascope, sharp focus, fine details, 8k, HDR, realism, realistic, key visual, film still, cinematic color grading, depth of field. Overall, it's an absolute world-class cinematic masterpiece. It's an aesthetically pleasing cinematic shot with impeccable attention to detail and impressive composition.You see that there is a lot more than what it is actually in the original prompt? Probably if you use one of those generators, the inclusion of those photographic terms such as “cinemascope” or “HDR” may yield results that can be beneficial or harmful. Ideally, you want to just take a look at the full prompts, and then test on a bare bones image generator so you have more control of the output.
Now, text to image is different than text to text or text to code, you want to be as terse as possible, almost as if you were making a shopping list. For example, the following prompt:
- Realism - Realistic - Photographic shot - Middle class - 56 year old French woman - Slim with broad hips - Graying hair - Prominent crow’s feet around her eyes - Dressed with casual - Silk scarf - Leather jacket - Baggy cords - Standing at the bar of a cafe in ParisYields the following for seed: 354188953 and guidanceScale: 1

And I get it, it may not be up to your expectations, but you see how it makes infinitely easy to debug what term leads the model where you want it to go.
The best advise I can give you, is to look at the many different generators that there are and check what prompt is linked to a “style”, because surely, what you are exactly looking for, someone has figured out and pasted it into some “Photograph realistic style”, or at least it can serve as a reference point.
That is a very helpful post. One thing I don’t understand is the “seed” idea. I assumed this was so that you can get the same person over and over again? but I tried including the seed in a new iteration, just by including “(seed:::64482721)” and I got a completely new set of people. I did find more terse descriptions helped a lot, which is strange because some of the generators say use as much detail as you like, and the AI brain text generator thing also works on this principle. It is frustrating to me that I find a great granny (well, more middle-aged lol) and can’t generate her again in other settings, so info about seeds would help feed my weakness for elegant, cultured grannies!
Well… that whole thing is an entire rabbit hole. You see (and I’m trying to be as compact as possible, but there are a million of videos and documentation on the matter), an LLM and similar try to take the inputs and order of inputs to “correlate” them with something in a data bank. This whole is called “tokenization”, and basically it turns “The orange cat is sleeping” into “A + B + C + D + E” where each variable is a “token” and often times, a single word as in the backend, the model breaks the tokens by whitespace, although, with some training “The cat” can be a single token, leading to a whole other universe of possible replies branching “cat” from “The cat”. This is why (naively), some people recommend “add as much detail” in the sense of something like “An old lady in Paris, discussing an intellectually difficult topic such as philosophy with a young blonde man”, instead of “old lady with blonde young man, discussing, focused, Paris”. Both yield different results, but one is driven a lot by the context of articles, prepositions, and whatnot, making it a nightmare to debug. Again, be very descriptive, but separating things allow for easier “debugging” if you will. Also, I should mention that repeating a word does have an effect, as you’ll see that the results from “old lady, scarf, drinking wine” is not the same as “old lady, scarf, scarf, scarf, drinking wine”. That’s why I emphasize that the “grocery list” approach is better, as you can take generating an image as “building a Lego” and see what piece does what.
Now, regarding the seed… that’s another whole problem. There is a better explanation in a video by Wolfram but I don’t remember which one it was, but pretty much, the seed locks you into a “potential state”, and not a single output, if that makes sense. So, if you reroll a seeded image, you’ll get potentially 5 diametrically different outputs with some accessory chances, plus some eldritch abomination of the model mixing them, but no more. So with a seed, you can find the exact granny you found once, but you may still require the luck of the draw. The reason for this is actually a bit complex and I’ll admit I don’t get it fully, but I recall it being also an issue in other neural network models such as Random Forest and similar, where seeds would not yield a 1:1 result always.
Then again, nothing beats downloading the image! A fun feature that perchance has, is that all images are coded in base64, so you can right click a generated image, do “Copy Link”, take the gargantuan link, put it on a
.txtand then use that gargantuan string of text to pass it to a converter and have it on your drive or even use it directly on an app or HTML!
-

