INDEX
Explanations
references to political leaders and their actions or appearances
New Auto-Interp
Negative Logits
079
-0.16
æİĽ
-0.15
nds
-0.15
haps
-0.15
independence
-0.14
haus
-0.14
MUT
-0.14
ständ
-0.14
chor
-0.14
UNS
-0.14
POSITIVE LOGITS
-sama
0.17
andan
0.16
Schwarz
0.15
.scalablytyped
0.15
dk
0.15
Dank
0.14
zig
0.14
ienen
0.14
personally
0.14
BackColor
0.14
Activations Density 0.299%