← lab
post-mortemdead-endimage-genfine-tuning

I trained a digital clone and it aged fifteen years

LoRA identity training collapsed to an age when one descriptor dominated 100% of captions. The model learns whatever you hold constant. Vary everything you do not want memorized.

The training finished clean. Loss converged from 11.4 to 0.44 over 1500 steps on an A100. Seventy-six minutes, rank 16, 69 curated face images filtered by embedding similarity. Then I ran inference.

The person in the output was me. But fifteen years older.

Every constant becomes a defining trait

Every caption started with the same block:

a photo of f3l1p3, a 40s brazilian man, short dark wavy hair,
full beard with graying chin, wearing dark rectangular glasses, ...

“Graying chin” appeared in 100% of the training samples. So the model locked that descriptor to the identity token, not as a per-photo detail.

The result at inference was systematic:

  • Scale 1.0: pure noise. Overfit destroyed the base model.
  • Scale 0.5: faded and elderly.
  • Scale 0.3: recognizable structure, but aged roughly fifteen years.

The model learned who I was from what stayed constant across samples. I made the wrong thing constant.

Vary everything you do not want memorized

Captions must vary everything the face does not define. Pose, expression, lighting, scene, clothing: all should vary. Whatever you repeat in 100% of samples, the model will bind to the identity token. If you want the model to learn a face, stop teaching it an age.

This was a dead-end. The pivot was zero-shot face injection: a reference photo at inference time, no training at all, no captions to get wrong. When better tools exist, the LoRA route is not worth fixing.

Takeaway

A model learns whatever you hold constant. If you want it to learn identity, you have to vary everything else.