INDEX
Explanations
references to political events or controversies
New Auto-Interp
Negative Logits
acomment
-0.14
ÐĴÐŀ
-0.13
ügen
-0.13
ASC
-0.13
commenting
-0.13
geg
-0.13
rowned
-0.13
arser
-0.13
causa
-0.13
matrimon
-0.13
POSITIVE LOGITS
'
0.24
amid
0.17
fed
0.16
says
0.15
"
0.15
VERIFY
0.15
-'
0.15
halo
0.15
:'
0.14
'--
0.14
Activations Density 0.344%