INDEX
Explanations
references to political actions and their implications
New Auto-Interp
Negative Logits
auge
-0.16
kola
-0.15
лада
-0.15
orgen
-0.15
baugh
-0.14
PROCUREMENT
-0.14
rand
-0.14
wear
-0.14
Jun
-0.14
eba
-0.14
POSITIVE LOGITS
aravel
0.16
äng
0.15
urum
0.15
ayi
0.14
malink
0.14
avier
0.14
Burl
0.14
istrovstvÃŃ
0.13
/generated
0.13
ci
0.13
Activations Density 1.492%