INDEX
    Explanations

    words related to negative actions or qualities

    New Auto-Interp
    Negative Logits
    è£ħ
    -0.80
    aterial
    -0.79
    asio
    -0.77
    socket
    -0.75
    çīĪ
    -0.75
    ainted
    -0.74
    marked
    -0.73
    emis
    -0.72
    ector
    -0.72
    orthy
    -0.71
    POSITIVE LOGITS
     disregard
    1.32
     pursuit
    1.28
    ness
    1.20
     antics
    1.20
     abandon
    1.13
     grin
    1.11
     arrogance
    1.08
     indifference
    1.08
     outburst
    1.06
     behavior
    1.06
    Act Density 0.187%

    No Known Activations