INDEX
    Explanations

    words that convey evaluations or judgments about people or situations

    New Auto-Interp
    Negative Logits
    oe
    -0.16
    892
    -0.15
    ayo
    -0.14
    urtle
    -0.14
    awan
    -0.14
    rij
    -0.14
     hoping
    -0.13
     Gig
    -0.13
    xab
    -0.13
    odied
    -0.13
    POSITIVE LOGITS
     result
    0.29
     due
    0.29
    due
    0.28
     based
    0.26
     resultado
    0.25
    _due
    0.24
     thanks
    0.23
     resultat
    0.23
     driven
    0.23
    thanks
    0.23
    Act Density 0.012%

    No Known Activations