INDEX
Explanations
This neuron activates on the word “dynasty” (and its subword fragments) whenever Chinese dynastic names or the term “dynasty” appears.
New Auto-Interp
Negative Logits
PO
-0.07
Interview
-0.07
Essentially
-0.07
(outfile
-0.06
alarm
-0.06
neighbors
-0.06
-cell
-0.06
-circle
-0.06
Kemp
-0.06
-floor
-0.06
POSITIVE LOGITS
dynasty
0.09
Dynasty
0.09
dyn
0.07
uz
0.07
υγ
0.07
νια
0.07
сих
0.06
ZIP
0.06
обычно
0.06
dyn
0.06
Activations Density 0.002%