INDEX
    Explanations

    The neuron fires on the word “Sign,” especially when it appears as the first token of section headings or titles.

    New Auto-Interp
    Negative Logits
     เช
    -0.06
     Economics
    -0.06
     upd
    -0.06
    [:]
    -0.06
    -release
    -0.06
    438
    -0.06
     clusters
    -0.06
     WAL
    -0.06
     Prel
    -0.06
    _intro
    -0.06
    POSITIVE LOGITS
     sign
    0.10
     Sign
    0.09
    Sign
    0.08
    \',
    0.07
    ΗΜΑ
    0.06
     SIGN
    0.06
    ...,
    0.06
    _rewrite
    0.06
     dig
    0.06
    YG
    0.06
    Act Density 0.002%

    No Known Activations