INDEX
    Explanations

    negations and phrases that indicate uncertainty or lack of affirmation

    New Auto-Interp
    Negative Logits
    amoan
    -0.64
     Вікі
    -0.62
    iciary
    -0.61
    rician
    -0.56
    amaran
    -0.56
    msgTypes
    -0.56
     Faso
    -0.55
    twimg
    -0.54
    |$.
    -0.53
    unhofer
    -0.52
    POSITIVE LOGITS
    ารถ
    0.59
     meta
    0.56
    meta
    0.52
    Meta
    0.51
     erk
    0.48
     campi
    0.48
    rensa
    0.47
    ivir
    0.46
     Meta
    0.45
    ądź
    0.45
    Act Density 0.120%

    No Known Activations