INDEX
    Explanations

    words related to negativity or criticism

    the presence of the substring "pl" in various contexts

    New Auto-Interp
    Negative Logits
    âĸ¬
    -0.74
    HAHA
    -0.71
    shapeshifter
    -0.70
    QUI
    -0.70
    DEM
    -0.68
    é¾įåĸļ士
    -0.67
    âĸ¬âĸ¬
    -0.66
    gerald
    -0.65
    ////////////////////////////////
    -0.64
    spin
    -0.63
    POSITIVE LOGITS
    asma
    1.30
    acement
    1.18
    atinum
    1.18
    icably
    1.12
    enty
    1.10
    astic
    1.08
    anted
    1.02
    atter
    1.01
    atoon
    0.99
    ague
    0.98
    Act Density 0.009%

    No Known Activations