INDEX
Explanations
occurrences of the word "the"
New Auto-Interp
Negative Logits
ahead
-0.16
Singer
-0.15
oler
-0.14
edn
-0.14
uesta
-0.14
icker
-0.14
ods
-0.14
Ïħνα
-0.14
various
-0.14
uli
-0.14
POSITIVE LOGITS
brains
0.21
envy
0.20
sole
0.20
brains
0.19
only
0.19
son
0.18
ONLY
0.18
sole
0.17
recipient
0.17
lead
0.17
Activations Density 0.073%