INDEX
Explanations
negations or uncertainty in statements
New Auto-Interp
Negative Logits
Stewart
-0.15
kup
-0.14
ìĻĦ
-0.14
kop
-0.14
Equivalent
-0.14
jf
-0.14
Uncategorized
-0.14
dney
-0.14
uyết
-0.14
lobals
-0.13
POSITIVE LOGITS
true
0.55
true
0.45
True
0.40
TRUE
0.38
True
0.38
case
0.34
TRUE
0.33
.true
0.33
happening
0.32
verdade
0.32
Activations Density 0.093%