INDEX
Explanations
negations or contradictions in statements
New Auto-Interp
Negative Logits
iscard
-0.15
PLAIN
-0.14
aru
-0.14
ses
-0.13
rian
-0.13
isha
-0.13
ional
-0.13
tribunal
-0.13
Klopp
-0.13
``
-0.13
POSITIVE LOGITS
psz
0.17
apot
0.16
eft
0.16
ÙĦس
0.16
aroo
0.14
remot
0.14
ept
0.14
enstein
0.14
Emerson
0.14
ymb
0.14
Activations Density 0.094%