INDEX
    Explanations

    expressions of commendation or praise

    New Auto-Interp
    Negative Logits
    etra
    -0.58
    into
    -0.55
     Levin
    -0.53
     Slo
    -0.52
    ne
    -0.52
    nage
    -0.51
    b
    -0.51
    зму
    -0.51
    __))
    -0.50
     costi
    -0.50
    POSITIVE LOGITS
     praise
    1.87
     praises
    1.74
     praising
    1.72
     praised
    1.71
    praise
    1.70
     applaud
    1.54
     Praise
    1.53
     commend
    1.48
     commendation
    1.43
    Praise
    1.39
    Act Density 0.163%

    No Known Activations