INDEX
Explanations
mentions of official titles or positions of authority
statements made by officials or representatives
New Auto-Interp
Negative Logits
abiding
-0.54
tumblr
-0.51
successfully
-0.50
Âł Âł Âł Âł
-0.50
pires
-0.49
diaper
-0.49
Âł Âł Âł Âł Âł Âł Âł Âł
-0.48
miracle
-0.48
stret
-0.47
Articles
-0.46
POSITIVE LOGITS
]."
0.64
anton
0.60
adding
0.59
].
0.57
zinski
0.54
¥µ
0.54
>.
0.53
}.
0.52
heny
0.52
izer
0.52
Activations Density 0.660%