INDEX
Explanations
references to deception or authenticity in documents and communications
New Auto-Interp
Negative Logits
á»ijc
-0.15
Insensitive
-0.14
orgot
-0.14
linger
-0.14
Antar
-0.13
declspec
-0.13
Unchecked
-0.13
нÑĸв
-0.13
quip
-0.13
_drvdata
-0.13
POSITIVE LOGITS
fake
0.69
fake
0.60
Fake
0.59
Fake
0.55
faker
0.52
false
0.47
_fake
0.47
åģĩ
0.45
(fake
0.45
.fake
0.43
Activations Density 0.482%