INDEX
Explanations
instances of dishonesty or discrepancies in statements
New Auto-Interp
Negative Logits
↵
-0.63
↵↵
-0.63
,
-0.57
parcourir
-0.55
SourceChecksum
-0.52
and
-0.48
N
-0.48
-0.47
.
-0.47
רושלים
-0.46
POSITIVE LOGITS
Italijanski
0.75
ddelweddau
0.71
BagConstraints
0.69
Vikipedi
0.68
RTGC
0.67
Inti
0.67
]--;
0.66
IPX
0.64
Enumer
0.64
theros
0.63
Activations Density 0.019%