INDEX
    Explanations

    adjectives expressing evaluation or suitability

    New Auto-Interp
    Negative Logits
     emphat
    -1.49
     !...
    -1.42
     ?...
    -1.37
     fuf
    -1.34
     increa
    -1.33
     indestru
    -1.33
     desir
    -1.31
     accla
    -1.30
     nece
    -1.25
     suspic
    -1.24
    POSITIVE LOGITS
    .
    0.69
     enough
    0.68
    ;
    0.67
    ?
    0.62
     for
    0.61
    ,
    0.60
    :
    0.59
    0.58
     in
    0.58
    !
    0.58
    Act Density 0.412%

    No Known Activations