INDEX
Explanations
instructions
This neuron detects the command token “start,” i.e. user instructions that begin with the word “start.”
New Auto-Interp
Negative Logits
(options
-0.07
deity
-0.06
rice
-0.06
انگ
-0.06
/layouts
-0.06
Oslo
-0.06
rock
-0.05
blo
-0.05
col
-0.05
filter
-0.05
POSITIVE LOGITS
имо
0.07
плат
0.07
porate
0.07
vandalism
0.07
amatør
0.06
구글상위
0.06
jedn
0.06
yı
0.06
-common
0.06
istant
0.06
Activations Density 0.139%