INDEX
    Explanations

    nouns related to actions or outcomes

    New Auto-Interp
    Negative Logits
     congr
    -0.69
    etts
    -0.67
     thanked
    -0.63
    knit
    -0.62
     congratulated
    -0.61
     Roots
    -0.60
    erning
    -0.59
     Redd
    -0.57
     recognizes
    -0.56
    nect
    -0.56
    POSITIVE LOGITS
     incorrectly
    1.52
     inconsist
    1.50
     wrong
    1.45
     poorly
    1.41
     incorrect
    1.41
     inappropriately
    1.35
     improperly
    1.33
     unnecessarily
    1.30
     unsu
    1.29
     inaccurate
    1.29
    Act Density 1.029%

    No Known Activations