INDEX
    Explanations

    strong evaluative words

    words related to expressions of emotion or descriptors that convey disapproval

    New Auto-Interp
    Negative Logits
    enhagen
    -0.89
     Keller
    -0.78
     Shepherd
    -0.77
     Hayward
    -0.75
     Jenkins
    -0.73
    å§«
    -0.73
     Grayson
    -0.73
     Frey
    -0.72
     Decker
    -0.71
     Lauder
    -0.70
    POSITIVE LOGITS
    anc
    1.01
    withstanding
    1.00
    arching
    0.97
    rew
    0.95
    aring
    0.95
    rown
    0.92
    lic
    0.91
    ounded
    0.90
    arr
    0.90
    ob
    0.89
    Act Density 0.190%

    No Known Activations