INDEX
    Explanations

    phrases or words related to being direct, upfront, or uncomplicated

    New Auto-Interp
    Negative Logits
    è¦ļéĨĴ
    -0.79
     Lauder
    -0.76
     livest
    -0.71
    mble
    -0.68
    ĸļ
    -0.67
    Downloadha
    -0.66
     Ples
    -0.66
    theless
    -0.65
    7601
    -0.64
     mur
    -0.64
    POSITIVE LOGITS
    ened
    1.35
    away
    1.20
    ening
    1.19
    eners
    1.09
    forward
    1.00
    ener
    0.92
    edge
    0.90
    line
    0.89
    bent
    0.88
    FIX
    0.88
    Act Density 0.023%

    No Known Activations