INDEX
    Explanations

    This neuron activates on positive evaluative words (like “loved,” “enjoyed,” “amazing,” etc.) indicating praise or enthusiasm.

    New Auto-Interp
    Negative Logits
     quoted
    -0.08
     puppy
    -0.07
     rainy
    -0.07
    ươi
    -0.06
    Mov
    -0.06
     ease
    -0.06
    _assignment
    -0.06
    WARDS
    -0.06
     WAIT
    -0.06
     playoffs
    -0.06
    POSITIVE LOGITS
     Funds
    0.07
     enjoyed
    0.07
    469
    0.06
     dislike
    0.06
     decentral
    0.06
     еди
    0.06
    (err
    0.06
    wand
    0.06
    atherine
    0.06
    243
    0.06
    Act Density 0.014%

    No Known Activations