INDEX
Explanations
punctuation marks that indicate questions and exclamations
New Auto-Interp
Negative Logits
nearby
-0.16
Indeed
-0.16
Meanwhile
-0.16
Meanwhile
-0.16
Dit
-0.15
sez
-0.15
indeed
-0.15
Similarly
-0.14
Similarly
-0.13
writes
-0.13
POSITIVE LOGITS
ITS
0.29
its
0.26
Lets
0.25
Lets
0.25
ITS
0.25
its
0.25
Its
0.24
Its
0.24
Majority
0.20
lets
0.20
Activations Density 0.592%