INDEX
Explanations
phrases that emphasize the presence of "the" in various contexts
New Auto-Interp
Negative Logits
remainder
-0.18
brighter
-0.16
sharper
-0.16
igham
-0.16
oten
-0.15
igger
-0.15
rens
-0.15
happier
-0.14
ichen
-0.14
eness
-0.14
POSITIVE LOGITS
ici
0.29
rou
0.26
bol
0.26
fran
0.26
flatt
0.25
col
0.25
slee
0.25
dri
0.24
cris
0.24
slic
0.23
Activations Density 0.262%