INDEX
    Explanations

    The main thing this neuron does is detect occurrences of the word “negative.”

    New Auto-Interp
    Negative Logits
    нам
    -0.08
     بزر
    -0.07
    owntown
    -0.06
     moons
    -0.06
     Convert
    -0.06
    ezpe
    -0.06
    Fitness
    -0.06
    Wr
    -0.06
    badge
    -0.06
    "))))↵
    -0.06
    POSITIVE LOGITS
     الل
    0.07
     hous
    0.07
     dispers
    0.06
    0.06
     ">↵
    0.06
     Tight
    0.06
     conceive
    0.06
    音乐
    0.06
     borrowed
    0.06
     věci
    0.06
    Act Density 0.003%

    No Known Activations