INDEX
    Explanations

    The neuron fires on occurrences of the word “variant.”

    New Auto-Interp
    Negative Logits
    Lee
    -0.07
     bio
    -0.07
    Wy
    -0.07
     kHz
    -0.07
    Bio
    -0.07
     hello
    -0.07
     Lee
    -0.07
     Gro
    -0.06
    ro
    -0.06
     xy
    -0.06
    POSITIVE LOGITS
    ant
    0.13
     variant
    0.12
    ANT
    0.12
    ulant
    0.10
     mutant
    0.10
    ent
    0.09
    ант
    0.09
    vant
    0.09
    nant
    0.09
    quent
    0.09
    Act Density 0.047%

    No Known Activations