INDEX
    Explanations

    The neuron flags words and phrases marking controversies or scandals (e.g. “offensive,” “resurfaced,” “controversy,” “#MeToo”).

    New Auto-Interp
    Negative Logits
    ани
    -0.07
     deliver
    -0.07
    .mu
    -0.06
    IBC
    -0.06
    IMER
    -0.06
    ESS
    -0.06
    енд
    -0.06
    EE
    -0.06
     spanking
    -0.06
    -legged
    -0.06
    POSITIVE LOGITS
     Xml
    0.07
    ,url
    0.07
     edited
    0.06
     topic
    0.06
    .aut
    0.06
    _href
    0.06
    共同
    0.06
     skin
    0.06
    ria
    0.06
    dera
    0.06
    Act Density 0.008%

    No Known Activations