INDEX
Explanations
references to familial relationships, especially involving fathers
New Auto-Interp
Negative Logits
licet
-0.80
الحره
-0.71
Jovi
-0.71
trover
-0.70
öss
-0.68
Và
-0.67
houette
-0.66
delu
-0.66
coils
-0.65
ulets
-0.64
POSITIVE LOGITS
father
1.78
Father
1.66
fathers
1.66
FATHER
1.66
Fathers
1.61
Father
1.50
father
1.46
père
1.32
Vater
1.22
padre
1.21
Activations Density 0.047%