INDEX
Explanations
punctuation marks, particularly questions and exclamations
New Auto-Interp
Negative Logits
sels
-0.76
ãĥ«
-0.69
ancies
-0.69
çͰ
-0.67
éĹĺ
-0.66
ãĥIJ
-0.65
misunder
-0.64
imar
-0.64
recl
-0.63
banned
-0.63
POSITIVE LOGITS
Then
1.02
Because
0.96
Luckily
0.93
/"
0.92
Obviously
0.90
Nobody
0.90
Knowing
0.89
Sometimes
0.88
That
0.88
Which
0.85
Activations Density 0.051%