INDEX
Explanations
parenthetical comments at the end of sentences
New Auto-Interp
Negative Logits
inund
-0.82
undet
-0.80
stagn
-0.78
spir
-0.77
footing
-0.74
pudding
-0.74
ogly
-0.74
exemplary
-0.74
overwhelmed
-0.73
unus
-0.73
POSITIVE LOGITS
â̦)
1.38
Though
1.25
Laughs
1.25
See
1.23
laughs
1.22
Unless
1.22
Ironically
1.20
Also
1.20
hide
1.19
Actually
1.17
Activations Density 0.037%