INDEX
Explanations
questions and interrogative phrases
New Auto-Interp
Negative Logits
₁.
0.84
.,
0.80
。,
0.78
'.
0.77
.*;
0.77
}.
0.77
}^{+}$.0.76
].
0.73
。.
0.73
.[
0.73
POSITIVE LOGITS
?
4.73
?
4.32
؟
4.25
?"
4.04
?)
3.88
?”
3.86
?</
3.79
?\
3.75
?'
3.67
?»
3.65
Activations Density 2.282%