INDEX
    Explanations

    expressions related to protests and social injustice

    New Auto-Interp
    Negative Logits
     kesin
    -0.15
     обÑıз
    -0.15
    epy
    -0.14
    Į¨
    -0.14
    decltype
    -0.14
    pron
    -0.14
     permanently
    -0.14
    isex
    -0.14
     $__
    -0.13
     pron
    -0.13
    POSITIVE LOGITS
     harmless
    0.49
     innocent
    0.47
     innoc
    0.47
     perfectly
    0.46
     legitimate
    0.46
     lawful
    0.36
     benign
    0.36
     Innoc
    0.36
    æŃ£å¸¸
    0.34
     valid
    0.34
    Act Density 0.560%

    No Known Activations