INDEX
    Explanations

    The neuron flags tokens that appear in the “Summary:” portion of the prompt (i.e. it activates on words in the summary rather than in the article).

    New Auto-Interp
    Negative Logits
    ForgeryToken
    -0.07
    上が
    -0.06
    premium
    -0.06
     hundreds
    -0.06
     t�
    -0.06
    -0.06
     Hundreds
    -0.06
     khỏe
    -0.06
    IFT
    -0.06
     airline
    -0.06
    POSITIVE LOGITS
    увався
    0.07
     Merkezi
    0.06
     incess
    0.06
     وظ
    0.06
    sock
    0.06
    tabl
    0.06
     клуб
    0.06
     Sheila
    0.06
    -END
    0.06
    -ab
    0.06
    Act Density 0.010%

    No Known Activations