INDEX
    Explanations

    This neuron fires on evaluative sentiment words—adjectives or adverbs expressing positive or negative opinion.

    New Auto-Interp
    Negative Logits
    sth
    -0.06
    descending
    -0.06
    mony
    -0.06
     tong
    -0.06
    zburg
    -0.06
    urses
    -0.06
     Knife
    -0.06
    France
    -0.05
    atz
    -0.05
     vil
    -0.05
    POSITIVE LOGITS
    0.07
    formedURLException
    0.07
     tienen
    0.07
    Cre
    0.06
    _CHANGE
    0.06
     UITextView
    0.06
     indign
    0.06
    _sf
    0.06
    áci
    0.06
    _sha
    0.06
    Act Density 0.076%

    No Known Activations