I trained a digital clone and it aged fifteen years

The training finished clean. Loss converged from 11.4 to 0.44 over 1500 steps on an A100. Seventy-six minutes, rank 16, 69 curated face images filtered by embedding similarity. Then I ran inference.

The person in the output was me. But fifteen years older.

Every constant becomes a defining trait

Every caption started with the same block:

a photo of f3l1p3, a 40s brazilian man, short dark wavy hair,
full beard with graying chin, wearing dark rectangular glasses, ...

“Graying chin” appeared in 100% of the training samples. So the model locked that descriptor to the identity token, not as a per-photo detail.

The result at inference was systematic:

Scale 1.0: pure noise. Overfit destroyed the base model.
Scale 0.5: faded and elderly.
Scale 0.3: recognizable structure, but aged roughly fifteen years.

The model learned who I was from what stayed constant across samples. I made the wrong thing constant.

Vary everything you do not want memorized

Captions must vary everything the face does not define. Pose, expression, lighting, scene, clothing: all should vary. Whatever you repeat in 100% of samples, the model will bind to the identity token. If you want the model to learn a face, stop teaching it an age.

This was a dead-end. The pivot was zero-shot face injection: a reference photo at inference time, no training at all, no captions to get wrong. When better tools exist, the LoRA route is not worth fixing.

Takeaway

A model learns whatever you hold constant. If you want it to learn identity, you have to vary everything else.