INDEX
Explanations
statements indicating claims of misinformation or inaccuracies
New Auto-Interp
Negative Logits
ondo
-0.19
ATS
-0.17
atsu
-0.17
omat
-0.16
-0.16
acha
-0.15
Holl
-0.15
odies
-0.14
armacy
-0.14
ÑģÑĭ
-0.14
POSITIVE LOGITS
edn
0.17
Zw
0.16
ixe
0.15
lah
0.15
Fle
0.14
Ù쨧ÙĦ
0.14
ç©į
0.14
LEGRO
0.14
ysqli
0.14
{?>↵0.14
Activations Density 0.167%