INDEX
Explanations
mentions of family members, particularly mothers and fathers
New Auto-Interp
Negative Logits
ècie
-0.66
itaire
-0.56
baijan
-0.55
>=",
-0.54
autorytatywna
-0.52
IEWS
-0.51
όμε
-0.51
ometre
-0.50
nościo
-0.49
שוליים
-0.49
POSITIVE LOGITS
dad
1.26
father
1.22
dads
1.16
Dad
1.12
fathers
1.12
parents
1.09
Dad
1.08
mom
1.08
moms
1.05
FATHER
1.02
Activations Density 0.140%