INDEX
Explanations
words related to politics and conflicts, specifically focusing on regions and events related to political tensions and military actions
references to political regimes and conflicts, particularly involving North Korea and Russia
New Auto-Interp
Negative Logits
Sund
-0.57
reek
-0.51
Nar
-0.51
ãĥ³ãĤ¸
-0.50
GW
-0.49
Samar
-0.49
ãĥĻ
-0.49
minist
-0.48
Sym
-0.48
oret
-0.48
POSITIVE LOGITS
.''
0.88
.''.
0.87
.
0.84
.[
0.75
."
0.74
.'
0.73
ãĢĤ
0.69
'.
0.68
.</
0.67
.(
0.67
Activations Density 1.835%