INDEX
    Explanations

    mentions of weakness and vulnerability

    New Auto-Interp
    Negative Logits
    νε
    -0.16
     âĸij
    -0.16
    aylor
    -0.16
    .truth
    -0.15
    bens
    -0.15
    /***/
    -0.15
     gratuitement
    -0.14
    oÄį
    -0.14
    çͲ
    -0.14
    BJECT
    -0.14
    POSITIVE LOGITS
    plib
    0.17
    dou
    0.15
     Pf
    0.15
    225
    0.15
    233
    0.14
     League
    0.14
    while
    0.14
    PP
    0.14
    rist
    0.14
    atur
    0.14
    Act Density 0.007%

    No Known Activations