INDEX
Explanations
references to father figures and familial relationships
New Auto-Interp
Negative Logits
licet
-0.74
uilla
-0.69
__))
-0.65
Và
-0.65
houette
-0.64
Pickett
-0.63
öss
-0.61
delu
-0.61
vrons
-0.60
geries
-0.60
POSITIVE LOGITS
fathers
1.56
Fathers
1.55
father
1.50
FATHER
1.47
Father
1.41
Father
1.27
father
1.23
père
1.16
Fath
1.14
Père
1.13
Activations Density 0.091%