INDEX
    Explanations

    phrases related to negative emotions and criticism

    New Auto-Interp
    Negative Logits
    reen
    -0.78
    atories
    -0.76
    cible
    -0.71
    athering
    -0.68
     glim
    -0.67
    aldi
    -0.67
    unker
    -0.65
    oult
    -0.64
    ativity
    -0.63
    atory
    -0.63
    POSITIVE LOGITS
    !!!!!
    1.30
    !!!
    1.22
    !!
    1.18
    ?!
    1.13
    !!!!
    1.02
    !/
    0.98
    @#
    0.98
    !"
    0.94
    ??
    0.93
    @#&
    0.91
    Act Density 0.013%

    No Known Activations