INDEX
    Explanations

    phrases indicating performance or outcome assessment, particularly focusing on whether something is performing well or not

    expressions of performance or effectiveness

    New Auto-Interp
    Negative Logits
    adena
    -0.79
    ategory
    -0.78
    ory
    -0.76
    ruce
    -0.72
    İĭ
    -0.72
    hyde
    -0.71
    ules
    -0.70
    atto
    -0.69
    orical
    -0.66
    mitting
    -0.65
    POSITIVE LOGITS
     enough
    1.07
    enough
    0.99
    esley
    0.79
     Enough
    0.77
     behaved
    0.77
    espie
    0.73
     suited
    0.71
     liked
    0.70
    baum
    0.68
     alright
    0.66
    Act Density 0.031%

    No Known Activations