INDEX
Explanations
mentions of political figures and government actions
New Auto-Interp
Negative Logits
Translation
-0.73
Els
-0.68
ãĥ¼ãĥĨ
-0.67
$.
-0.66
é¾įå
-0.60
ãĤ¦ãĤ¹
-0.59
Ô
-0.59
+.
-0.58
ãĤ¯
-0.56
Sov
-0.56
POSITIVE LOGITS
announced
0.86
reacted
0.84
celebrates
0.76
awoke
0.75
announces
0.74
unveiled
0.74
apologised
0.73
warned
0.73
expects
0.72
reportedly
0.72
Activations Density 0.769%