INDEX
    Explanations

    This neuron responds to positive evaluative words that express praise or favorable opinion (e.g. “great,” “good,” “nice,” “alright”).

    New Auto-Interp
    Negative Logits
     bees
    -0.08
     brib
    -0.07
     fatty
    -0.07
     tn
    -0.07
     AUTHORS
    -0.06
     suppliers
    -0.06
    -0.06
     doma
    -0.06
     qualidade
    -0.06
     smallest
    -0.06
    POSITIVE LOGITS
    _SOURCE
    0.07
    ційної
    0.07
    _POLL
    0.06
    ongan
    0.06
    omain
    0.06
    \',
    0.06
    embros
    0.06
    “
    0.06
     Sınıf
    0.06
    ArgsConstructor
    0.06
    Act Density 0.070%

    No Known Activations