INDEX
Explanations
discussions around political claims and misinformation
New Auto-Interp
Negative Logits
ree
-0.17
lias
-0.16
ouv
-0.16
00
-0.15
Otto
-0.14
0
-0.14
ittings
-0.14
nier
-0.14
og
-0.14
kin
-0.14
POSITIVE LOGITS
ksam
0.19
atik
0.18
ailability
0.16
ienza
0.15
å½±
0.15
Wid
0.15
repos
0.15
ometown
0.14
Jad
0.14
_ctxt
0.14
Activations Density 0.278%