INDEX
Explanations
politically-related names and terms
mentions of political figures and related contexts
New Auto-Interp
Negative Logits
Niet
-0.65
anwhile
-0.58
Hiroshima
-0.58
mble
-0.54
FUL
-0.52
semble
-0.49
eatures
-0.48
Wem
-0.48
tml
-0.47
oenix
-0.46
POSITIVE LOGITS
's
0.76
Care
0.63
care
0.62
Semitism
0.61
ÃŃs
0.61
omics
0.55
meddling
0.54
´
0.53
anymore
0.52
ani
0.50
Activations Density 0.499%