INDEX
Explanations
the verb "to be" and its various forms in different contexts
New Auto-Interp
Negative Logits
hops
-0.63
ski
-0.60
Strawberry
-0.58
hurricanes
-0.58
SNAP
-0.58
Helic
-0.57
DRAG
-0.57
grandchildren
-0.55
Rubin
-0.54
Pin
-0.54
POSITIVE LOGITS
inated
0.82
ored
0.81
inate
0.80
oured
0.76
culus
0.72
phas
0.72
ony
0.72
tre
0.72
ax
0.70
aker
0.70
Activations Density 0.007%