INDEX
    Explanations

    articles/prepositions

    New Auto-Interp
    Negative Logits
    分布
    -0.07
     humiliating
    -0.06
     Jer
    -0.06
    aced
    -0.06
     Dub
    -0.06
    :::/
    -0.06
     Minist
    -0.06
     расс
    -0.06
     discrepan
    -0.06
    Zend
    -0.05
    POSITIVE LOGITS
    terrorism
    0.08
    ({↵
    0.07
     medicinal
    0.07
    _tool
    0.07
     downwards
    0.06
    phen
    0.06
     durumlarda
    0.06
    getPost
    0.06
    _lc
    0.06
     ******************************************************************************/↵↵
    0.06
    Act Density 0.019%

    No Known Activations