INDEX
Explanations
references to accounts or discussions involving political or legal matters
New Auto-Interp
Negative Logits
omite
-0.16
ç¨
-0.16
indow
-0.16
nelle
-0.15
aub
-0.15
heel
-0.14
sei
-0.14
pecting
-0.14
ç¼
-0.14
andi
-0.14
POSITIVE LOGITS
ulus
0.14
ald
0.14
ocrates
0.13
ameron
0.13
798
0.13
"go
0.13
Ziel
0.13
exels
0.13
itesse
0.13
âĤ¹
0.13
Activations Density 0.373%