INDEX
Explanations
affirmative responses or expressions of agreement
New Auto-Interp
Negative Logits
`).
-0.84
munk
-0.81
']").
-0.81
".
-0.80
%");
-0.80
'>"
-0.80
=")
-0.80
*/}
-0.79
();*/
-0.79
')]
-0.79
POSITIVE LOGITS
Yes
2.12
yes
2.05
Yes
2.04
YES
2.02
yes
1.99
YES
1.94
YesNo
1.20
Yess
1.18
Sí
1.08
Sì
1.07
Activations Density 0.054%