INDEX
    Explanations

    This neuron activates on the word “dynasty” (and its subword fragments) whenever Chinese dynastic names or the term “dynasty” appears.

    New Auto-Interp
    Negative Logits
     PO
    -0.07
    Interview
    -0.07
     Essentially
    -0.07
    (outfile
    -0.06
     alarm
    -0.06
     neighbors
    -0.06
    -cell
    -0.06
    -circle
    -0.06
     Kemp
    -0.06
    -floor
    -0.06
    POSITIVE LOGITS
     dynasty
    0.09
     Dynasty
    0.09
     dyn
    0.07
    uz
    0.07
    υγ
    0.07
    νια
    0.07
    сих
    0.06
    ZIP
    0.06
     обычно
    0.06
    dyn
    0.06
    Act Density 0.002%

    No Known Activations