INDEX
Explanations
This neuron activates on the comparative phrase “instead of,” i.e. tokens forming the “instead of” construction.
New Auto-Interp
Negative Logits
everytime
-0.07
Ngày
-0.07
.On
-0.06
mutations
-0.06
day
-0.06
ریه
-0.06
dm
-0.06
convinced
-0.06
DAY
-0.06
opro
-0.06
POSITIVE LOGITS
(hdc
0.07
vědom
0.07
Philly
0.07
äl
0.06
export
0.06
cra
0.06
cauliflower
0.06
\\/
0.06
+=(
0.06
POOL
0.06
Activations Density 0.005%