INDEX
Explanations
references to Russian entities
occurrences of the word "Russian."
New Auto-Interp
Negative Logits
Score
-0.88
ttes
-0.76
enance
-0.75
erer
-0.75
sbm
-0.73
Label
-0.72
Beck
-0.72
ayer
-0.71
oyer
-0.71
eter
-0.70
POSITIVE LOGITS
Federation
1.10
olig
0.95
Orthodox
0.95
istani
0.84
rou
0.83
embassy
0.82
separatists
0.82
ambassador
0.79
Embassy
0.79
consulate
0.77
Activations Density 0.025%