INDEX
Explanations
The neuron is activating for words related to steps or instructions represented by words like "Next"
instances of the word "Next."
New Auto-Interp
Negative Logits
Feldman
-0.75
sexes
-0.67
brim
-0.66
Sinai
-0.64
utic
-0.60
iveness
-0.60
Ãĸ
-0.60
Erie
-0.59
ted
-0.59
heterogeneity
-0.59
POSITIVE LOGITS
ĻĤ
0.80
Next
0.78
door
0.76
Scene
0.73
millenn
0.72
installment
0.69
ļéĨĴ
0.69
week
0.68
Phase
0.68
phase
0.68
Activations Density 0.035%