INDEX
Explanations
statements or claims that indicate truthfulness or validity
New Auto-Interp
Negative Logits
يتيمه
-0.95
AnchorStyles
-0.90
Monfieur
-0.90
Bernadette
-0.90
avoient
-0.84
صوتيه
-0.82
ejus
-0.81
sélectionnés
-0.76
<=",
-0.75
étoient
-0.75
POSITIVE LOGITS
True
1.26
true
1.25
TRUE
1.18
Tru
1.08
True
1.06
TRUE
1.03
Tru
1.03
stdbool
1.03
isTrue
1.00
False
0.96
Activations Density 0.082%