INDEX
Explanations
Mentioned in the text
This neuron is detecting section‐header phrases that ask to “list the organs mentioned” (i.e. instruction headings).
New Auto-Interp
Negative Logits
доб
-0.07
Volk
-0.07
سبک
-0.07
862
-0.07
Karlov
-0.07
Mickey
-0.06
sexes
-0.06
tricky
-0.06
SA
-0.06
InOut
-0.06
POSITIVE LOGITS
uyện
0.06
简单
0.06
όπου
0.06
.strip
0.06
rehabilit
0.06
._
0.06
]; ↵
0.06
_flight
0.06
unpaid
0.06
登場
0.06
Activations Density 0.014%