INDEX
    Explanations

    previously seen

    The neuron activates on phrases where the author quantifies having experienced (e.g. “seen”) something a certain number of times.

    New Auto-Interp
    Negative Logits
     bro
    -0.07
    _u
    -0.06
    のだ
    -0.06
    -0.06
    ouflage
    -0.06
    -0.06
     Benny
    -0.06
    /drivers
    -0.06
     değildir
    -0.06
    _ord
    -0.06
    POSITIVE LOGITS
    0.07
     подс
    0.07
    0.06
    -ts
    0.06
    Lights
    0.06
     जल
    0.06
     ",";↵
    0.06
    atts
    0.06
     Patton
    0.06
    itaire
    0.06
    Act Density 0.046%

    No Known Activations