INDEX
Explanations
deceptive behavior and dishonesty in statements
New Auto-Interp
Negative Logits
出版年
-0.39
UIControlState
-0.38
ụn
-0.37
có
-0.34
</tfoot>
-0.34
demo
-0.33
presump
-0.33
Zunanje
-0.33
too
-0.32
Hohen
-0.32
POSITIVE LOGITS
lied
0.93
mentiras
0.84
Lying
0.81
lying
0.79
lies
0.77
lie
0.74
Lies
0.73
谎
0.73
liar
0.70
Lying
0.70
Activations Density 0.417%