INDEX
Explanations
proper nouns
the definite article "The" in various contexts
New Auto-Interp
Negative Logits
gpu
-0.70
ounces
-0.64
beforehand
-0.61
with
-0.61
leeve
-0.61
/"
-0.60
ccoli
-0.58
directly
-0.58
§§
-0.57
patiently
-0.57
POSITIVE LOGITS
oret
1.61
odore
1.46
resa
1.37
atre
1.16
ories
1.15
simplest
1.00
notion
0.99
easiest
0.97
biggest
0.94
orem
0.93
Activations Density 0.339%