INDEX
    Explanations

    phrases and terms related to authority and conflict

    New Auto-Interp
    Negative Logits
    osaic
    -0.15
     boyc
    -0.14
     RESERVED
    -0.14
    reserved
    -0.14
    ulus
    -0.14
    loub
    -0.14
    @mail
    -0.14
     refusal
    -0.14
    ibold
    -0.14
    å´
    -0.14
    POSITIVE LOGITS
     control
    0.26
     eliminate
    0.23
     qu
    0.23
     sil
    0.23
     stop
    0.23
     cur
    0.22
    æĬij
    0.22
     curb
    0.22
    -control
    0.22
     suppression
    0.22
    Act Density 0.094%

    No Known Activations