INDEX
Explanations
references to funding sources and think tanks in political contexts
New Auto-Interp
Negative Logits
gan
-0.16
stub
-0.15
Stub
-0.15
immer
-0.15
aret
-0.15
erre
-0.13
ron
-0.13
ipa
-0.13
vest
-0.13
hybrids
-0.13
POSITIVE LOGITS
ÄĽl
0.14
abb
0.14
upp
0.14
vor
0.14
kening
0.13
opensource
0.13
Hess
0.13
arto
0.13
associ
0.13
utable
0.13
Activations Density 0.098%