INDEX
Explanations
questions or inquiries about identity and self-reflection
Questions asked or statements made to "you," "we," or "it"
what do you think
New Auto-Interp
Negative Logits
yes
-0.60
Yes
-0.60
Yes
-0.56
yes
-0.56
YES
-0.51
DebuggerNonUser
-0.50
definitely
-0.49
både
-0.49
TemporalType
-0.48
unquestionably
-0.48
POSITIVE LOGITS
exactly
1.17
exactly
1.00
exactement
0.96
exatamente
0.89
Exactly
0.88
esattamente
0.88
exactamente
0.85
precies
0.84
denn
0.79
そんなに
0.78
Activations Density 0.254%