INDEX
Explanations
proper nouns or specific entities
the word "the" in various contexts
New Auto-Interp
Negative Logits
thood
-0.72
iffe
-0.71
earch
-0.70
leeve
-0.69
rehend
-0.68
Background
-0.67
rade
-0.67
verage
-0.66
eno
-0.64
hire
-0.64
POSITIVE LOGITS
oret
1.23
latter
1.18
longest
1.16
shortest
1.14
same
1.11
fastest
1.11
biggest
1.09
smallest
1.08
largest
1.07
simplest
1.07
Activations Density 0.168%