INDEX
Explanations
phrases related to height and physical comparisons
New Auto-Interp
Negative Logits
Isn
-0.29
Isn
-0.28
Aren
-0.23
isn
-0.21
icio
-0.18
aren
-0.18
wasn
-0.17
åij¢
-0.16
ounder
-0.16
robe
-0.16
POSITIVE LOGITS
nor
0.21
did
0.19
does
0.18
do
0.17
NOR
0.16
Nor
0.16
Nor
0.15
oron
0.15
odzi
0.15
huh
0.15
Activations Density 0.050%