INDEX
    Explanations

    instances of praise or compliments

    New Auto-Interp
    Negative Logits
     Accepted
    -0.15
    окон
    -0.14
    avern
    -0.14
    ina
    -0.13
    Warn
    -0.13
    Symbols
    -0.13
    .ant
    -0.13
     accepted
    -0.13
    ellig
    -0.13
     ARGS
    -0.13
    POSITIVE LOGITS
     compliment
    0.47
     praise
    0.47
     complement
    0.41
     compliments
    0.40
     praises
    0.39
     comple
    0.34
     praising
    0.34
     complimentary
    0.33
     praised
    0.30
    è¤
    0.27
    Act Density 0.289%

    No Known Activations