INDEX
    Explanations

    unexpected behavior

    This neuron flags content‐bearing tokens (nouns, verbs, adjectives, punctuation, code identifiers, etc.) and stays off for common function words (the, and, to, etc.).

    New Auto-Interp
    Negative Logits
     Ор
    -0.06
    aversal
    -0.06
    _with
    -0.06
    visual
    -0.06
     studi
    -0.06
    官网
    -0.06
    .retry
    -0.06
     Walt
    -0.06
     Schwe
    -0.06
    rewrite
    -0.06
    POSITIVE LOGITS
     restricting
    0.06
     ALL
    0.06
     sprayed
    0.06
    MET
    0.06
     готов
    0.06
     neigh
    0.06
     continents
    0.06
     weekday
    0.06
    0.06
     tow
    0.06
    Act Density 0.067%

    No Known Activations