INDEX
Explanations
the presence of the word "false"
New Auto-Interp
Negative Logits
وتسجيلات
-0.87
tamment
-0.79
+#+#
-0.78
Hecht
-0.77
chré
-0.76
Magi
-0.75
Puig
-0.71
زیین
-0.69
---+
-0.69
EClass
-0.68
POSITIVE LOGITS
false
1.17
false
1.00
False
0.94
fals
0.89
False
0.86
FALSE
0.86
FALSE
0.82
falsehood
0.79
ation
0.79
falsely
0.79
Activations Density 0.100%