INDEX
    Explanations

    The neuron responds to uppercase acronym‐style tokens (e.g. “U”, “GH”) in the text.

    New Auto-Interp
    Negative Logits
    licity
    -0.08
    alom
    -0.07
    uria
    -0.07
    оны
    -0.07
    них
    -0.07
    -0.07
    хо
    -0.07
    woods
    -0.07
    &M
    -0.07
    Century
    -0.07
    POSITIVE LOGITS
     (?,
    0.06
     "),↵
    0.06
     %↵↵
    0.06
    '];
    ↵
    0.06
     mağ
    0.06
     %↵
    0.06
     @{↵
    0.06
    0.06
    ”。↵↵
    0.06
    0.06
    Act Density 0.346%

    No Known Activations