INDEX
Explanations
negative phrases and contradictions in statements
New Auto-Interp
Negative Logits
btw
-0.17
ache
-0.16
but
-0.15
ACION
-0.15
oop
-0.15
but
-0.15
acher
-0.14
ä¸įè¿ĩ
-0.14
uft
-0.14
305
-0.14
POSITIVE LOGITS
↵↵
0.19
WindowState
0.16
NodeType
0.15
BorderStyle
0.15
ones
0.15
unate
0.15
że
0.15
SPATH
0.15
ãģĿãĤĮãģ¯
0.14
actual
0.14
Activations Density 0.393%