INDEX
Explanations
phrases expressing overt dishonesty or bold statements that are clearly untrue
New Auto-Interp
Negative Logits
CodedInputStream
-0.86
насељу
-0.77
évaluateur
-0.70
بيها
-0.67
WriteBarrier
-0.66
الدراسه
-0.64
InjectAttribute
-0.63
رشف
-0.63
useStyles
-0.62
DeleteBehavior
-0.62
POSITIVE LOGITS
outright
1.04
downright
0.80
blatant
0.70
blatantly
0.66
overt
0.66
openly
0.60
gross
0.57
explicit
0.57
completely
0.57
totally
0.57
Activations Density 0.714%