INDEX
Explanations
aspects related to political ideologies and their implications
New Auto-Interp
Negative Logits
åĿ
-0.17
529
-0.15
üzel
-0.15
èĤ
-0.14
ately
-0.14
ãĥIJãĥ¼
-0.14
athers
-0.14
ensi
-0.13
ôm
-0.13
racak
-0.13
POSITIVE LOGITS
ness
0.15
echn
0.14
Ỽ
0.14
rok
0.14
lek
0.14
Dich
0.14
certain
0.14
hir
0.14
hour
0.14
Mer
0.14
Activations Density 0.013%