INDEX
    Explanations

    The main thing this neuron does is detect occurrences of the word “filter.”

    New Auto-Interp
    Negative Logits
     overseeing
    -0.07
     nen
    -0.07
     Stan
    -0.07
     Stanley
    -0.07
     conceived
    -0.07
    ad
    -0.07
    286
    -0.07
     Nou
    -0.07
    23
    -0.06
     Conan
    -0.06
    POSITIVE LOGITS
     filter
    0.14
     Filter
    0.13
    Filter
    0.12
     filters
    0.11
    filter
    0.11
     FILTER
    0.10
    FILTER
    0.09
    filtr
    0.09
    filtered
    0.09
     filtered
    0.09
    Act Density 0.019%

    No Known Activations