INDEX
    Explanations

    The neuron selectively fires on occurrences of the token “way.”

    New Auto-Interp
    Negative Logits
    字符
    -0.07
     abyss
    -0.07
     κύ
    -0.07
    yum
    -0.07
    itals
    -0.06
     Adoles
    -0.06
     suggests
    -0.06
    하는데
    -0.06
     WWE
    -0.06
     Significant
    -0.06
    POSITIVE LOGITS
     way
    0.08
    _Form
    0.07
    اخت
    0.07
    _written
    0.07
    �a
    0.06
     Sty
    0.06
    andatory
    0.06
    _OVERFLOW
    0.06
     Dy
    0.06
    editary
    0.06
    Act Density 0.011%

    No Known Activations