INDEX
Explanations
references to political figures and their associated actions or statements
New Auto-Interp
Negative Logits
ÑĨеп
-0.17
.updateDynamic
-0.16
ört
-0.16
ammo
-0.15
enci
-0.14
jk
-0.14
ahun
-0.14
ephir
-0.14
ForRow
-0.14
ãĥ§
-0.14
POSITIVE LOGITS
former
0.60
Former
0.54
former
0.49
Former
0.49
retired
0.37
býval
0.33
erst
0.30
formerly
0.29
سابÙĤ
0.28
ex
0.27
Activations Density 0.148%