INDEX
Explanations
references to fathers and father figures
New Auto-Interp
Negative Logits
itſelf
-1.12
iſt
-1.09
ſche
-1.00
myſelf
-0.97
Majefty
-0.91
Diſ
-0.91
ſelves
-0.91
beſt
-0.90
ſelf
-0.88
houſe
-0.88
POSITIVE LOGITS
father
0.93
Father
0.92
Father
0.83
father
0.83
père
0.74
FATHER
0.69
fathers
0.65
fathers
0.63
Mr
0.57
::::::::
0.55
Activations Density 0.078%