INDEX
Explanations
phrases expressing strong emotions or decisive actions
punctuation marks indicating exclamations, questions, and statements
New Auto-Interp
Negative Logits
ancies
-0.77
sels
-0.71
imes
-0.70
ibles
-0.66
oug
-0.66
warr
-0.64
SPA
-0.64
accumulated
-0.64
negatives
-0.63
inous
-0.62
POSITIVE LOGITS
Saying
0.86
[/
0.83
Then
0.82
/"
0.82
Similarly
0.81
Sometimes
0.79
Likewise
0.77
Knowing
0.77
Nobody
0.76
Conversely
0.76
Activations Density 0.127%