INDEX
Explanations
references to support and opposition in political contexts
New Auto-Interp
Negative Logits
rana
-0.15
ebek
-0.14
ild
-0.14
tas
-0.14
iÄĻ
-0.14
ůr
-0.14
reife
-0.14
Chan
-0.13
egan
-0.13
affe
-0.13
POSITIVE LOGITS
notion
0.19
against
0.17
/support
0.16
idea
0.16
notions
0.15
Tod
0.15
ÙģØª
0.15
crypt
0.15
bele
0.15
ston
0.14
Activations Density 0.131%