INDEX
Explanations
references to fathers and father figures
New Auto-Interp
Negative Logits
CONT
-0.78
atility
-0.77
clude
-0.75
actionDate
-0.71
phis
-0.67
clusions
-0.65
EVA
-0.65
cluded
-0.65
arnaev
-0.65
mble
-0.64
POSITIVE LOGITS
liest
0.88
hesis
0.81
patriarch
0.80
dad
0.80
daddy
0.79
uncle
0.79
Dad
0.77
father
0.76
iji
0.76
uncle
0.75
Activations Density 0.009%