INDEX
Explanations
references to paternal figures or authority figures
references to "fathers" and parental figures in various contexts
New Auto-Interp
Negative Logits
FW
-0.83
é¾
-0.74
Flavoring
-0.71
IFF
-0.71
CHAT
-0.71
ELL
-0.68
eq
-0.68
mble
-0.67
ORT
-0.64
AH
-0.63
POSITIVE LOGITS
father
1.22
Fathers
1.05
sonian
0.89
parents
0.88
fathers
0.87
ures
0.86
hood
0.81
stein
0.79
sson
0.79
essor
0.77
Activations Density 0.005%