INDEX
Explanations
statements and actions characterized by deception or falsehood
New Auto-Interp
Negative Logits
AssemblyProduct
-0.80
виправивши
-0.76
autoreleasepool
-0.76
onOptions
-0.75
calendriers
-0.74
AssemblyTitle
-0.73
-0.68
>=",
-0.68
})));
-0.67
참고
-0.66
POSITIVE LOGITS
false
1.87
lie
1.76
lies
1.71
lied
1.71
falsehood
1.67
mentira
1.59
false
1.58
False
1.58
fake
1.56
deception
1.54
Activations Density 0.778%