INDEX
    Explanations

    references to "gore," particularly in relation to its violent or horrific context

    New Auto-Interp
    Negative Logits
    olib
    -0.16
    ä¸Ńæĸĩ
    -0.15
    ATTERN
    -0.14
    coop
    -0.14
    ÙĥØ©
    -0.14
    wiki
    -0.14
    rint
    -0.14
    nP
    -0.13
     lick
    -0.13
    \Active
    -0.13
    POSITIVE LOGITS
     below
    0.16
     public
    0.15
     Tu
    0.15
    pray
    0.14
    _UNUSED
    0.14
     McInt
    0.14
     fract
    0.14
    agraph
    0.14
     fee
    0.13
    isty
    0.13
    Act Density 0.003%

    No Known Activations