INDEX
    Explanations

    statements expressing negation or contradiction

    New Auto-Interp
    Negative Logits
     actionMode
    -0.53
     ostavi
    -0.52
    UniformLocation
    -0.52
     Vaux
    -0.46
    انجليز
    -0.46
     ويكيميديا
    -0.46
     referrerpolicy
    -0.45
     recevrez
    -0.44
    ้งาน
    -0.44
     izd
    -0.43
    POSITIVE LOGITS
     forget
    1.00
     misunderstand
    0.88
     worry
    0.74
     misunderstood
    0.73
     Forget
    0.69
     forgetting
    0.68
     Donny
    0.67
     FetchType
    0.66
    AnchorStyles
    0.65
    forget
    0.65
    Act Density 0.048%

    No Known Activations