INDEX
    Explanations

    Hate and dislike

    This neuron activates on words and phrases expressing strong negative emotions or interpersonal conflict (e.g. hate, annoy, nuts).

    New Auto-Interp
    Negative Logits
    ет
    -0.07
    Traits
    -0.06
    ADDRESS
    -0.06
    Both
    -0.06
     trata
    -0.06
    tracking
    -0.06
     contained
    -0.06
     disaster
    -0.06
    Request
    -0.06
     Tears
    -0.06
    POSITIVE LOGITS
    0.07
    0.07
     příro
    0.06
     showc
    0.06
     seviy
    0.06
    visor
    0.06
    =_("
    0.06
     نت
    0.06
     haze
    0.06
     nive
    0.06
    Act Density 0.048%

    No Known Activations