INDEX
Explanations
references to Vladimir Lenin and Vladimir Putin
New Auto-Interp
Negative Logits
åĮ
-0.16
umba
-0.16
iais
-0.16
emente
-0.16
onnen
-0.15
umph
-0.15
alian
-0.15
olis
-0.15
inear
-0.15
inely
-0.15
POSITIVE LOGITS
imir
0.27
Putin
0.22
islav
0.18
Vladim
0.17
mir
0.17
Putin
0.17
ãĥĨãĥ«
0.15
Lenin
0.15
окон
0.15
ãĥĥãĥĹ
0.15
Activations Density 0.007%