INDEX
    Explanations

    automatically

    The neuron flags adverbs (especially ones ending in “-ly”).

    New Auto-Interp
    Negative Logits
     Dar
    -0.07
     sin
    -0.07
     ||
    -0.06
    679
    -0.06
    -0.06
     sd
    -0.06
    III
    -0.06
    -0.06
     cliff
    -0.06
    .cl
    -0.05
    POSITIVE LOGITS
     automatically
    0.14
    autom
    0.09
     autom
    0.09
     автом
    0.08
     Headquarters
    0.08
    Autom
    0.08
    ocom
    0.08
     automáticamente
    0.07
     OPT
    0.07
    汽车
    0.07
    Act Density 0.008%

    No Known Activations