INDEX
    Explanations

    Negation and prevention

    New Auto-Interp
    Negative Logits
     praise
    -0.06
    opensource
    -0.06
     BufferedReader
    -0.06
    NotFound
    -0.06
    ['_
    -0.06
    +");↵
    -0.06
     Ally
    -0.06
     proven
    -0.06
    -0.06
     treadmill
    -0.06
    POSITIVE LOGITS
    osten
    0.07
     azalt
    0.07
     لو
    0.06
    [property
    0.06
     loung
    0.06
    (dy
    0.06
    、今
    0.06
     kullanım
    0.06
    uni
    0.06
     entitled
    0.06
    Act Density 0.027%

    No Known Activations