INDEX
Explanations
references to individuals or personal pronouns
New Auto-Interp
Negative Logits
æk
-0.17
nun
-0.17
erva
-0.15
eck
-0.15
smarty
-0.15
ormsg
-0.15
erland
-0.14
ecko
-0.14
ekk
-0.14
ataka
-0.14
POSITIVE LOGITS
u
0.17
ensus
0.16
Nah
0.15
Duch
0.15
cont
0.15
Ez
0.14
Le
0.14
-picture
0.14
inst
0.14
zl
0.14
Activations Density 0.005%