INDEX
Explanations
references to Russia
references to Russia
New Auto-Interp
Negative Logits
Label
-0.75
enance
-0.74
inances
-0.71
Score
-0.71
oyer
-0.71
tc
-0.71
sbm
-0.70
erving
-0.69
wait
-0.68
own
-0.68
POSITIVE LOGITS
olig
0.96
Federation
0.96
Orthodox
0.91
rou
0.87
Kremlin
0.81
kaya
0.80
Russians
0.80
Ð
0.80
citiz
0.78
Russian
0.77
Activations Density 0.020%