INDEX
Explanations
affirmations and strong agreement expressions
New Auto-Interp
Negative Logits
oya
-0.19
rray
-0.15
occo
-0.15
å«
-0.15
enge
-0.14
lander
-0.14
OTOS
-0.14
oba
-0.14
/apis
-0.13
ayet
-0.13
POSITIVE LOGITS
yes
0.37
yes
0.34
Yes
0.30
Yep
0.28
Yep
0.25
Yes
0.25
YES
0.24
=yes
0.24
Yup
0.24
.Yes
0.23
Activations Density 0.084%