INDEX
Explanations
terms and phrases indicating falsehood or deception
New Auto-Interp
Negative Logits
AssemblyProduct
-0.63
Tembelea
-0.61
Inhalation
-0.58
plor
-0.53
MLLoader
-0.52
GIH
-0.52
mobilité
-0.51
NamedQueries
-0.50
Beine
-0.50
tamment
-0.50
POSITIVE LOGITS
false
4.54
false
3.83
False
3.77
False
3.45
FALSE
3.07
falso
2.77
FALSE
2.70
falsa
2.61
falsely
2.57
falsos
2.51
Activations Density 0.108%