INDEX
    Explanations

    phrases indicating success or effectiveness

    expressions indicating effectiveness or success

    New Auto-Interp
    Negative Logits
    ategory
    -0.75
    agine
    -0.69
    htaking
    -0.69
    rush
    -0.68
    ilities
    -0.67
    guyen
    -0.67
    hyde
    -0.66
    avorite
    -0.66
    amera
    -0.66
    ilion
    -0.65
    POSITIVE LOGITS
     enough
    1.30
    enough
    1.19
     Enough
    0.91
    bye
    0.81
     behaved
    0.80
    baum
    0.80
    esley
    0.77
    spring
    0.77
     suited
    0.75
     vers
    0.69
    Act Density 0.039%

    No Known Activations