INDEX
Explanations
references to specific individuals or proper nouns
after first names or articles
New Auto-Interp
Negative Logits
préc
-0.64
Theſe
-0.62
lèvres
-0.59
nationaux
-0.59
récentes
-0.59
marquées
-0.56
concernés
-0.56
للاسماء
-0.56
AssemblyTitle
-0.55
barnen
-0.55
POSITIVE LOGITS
Indias
0.97
kespea
0.94
Smiths
0.92
cys
0.84
Americas
0.83
didnt
0.82
childs
0.82
wasnt
0.79
Whats
0.79
couldnt
0.79
Activations Density 0.205%