INDEX
    Explanations

    phrases indicating effort or attempt

    phrases related to effort and intention

    New Auto-Interp
    Negative Logits
    ELD
    -0.72
    zynski
    -0.70
    OPS
    -0.68
     Worse
    -0.66
    EStream
    -0.66
    UTH
    -0.63
    activation
    -0.62
     Liberation
    -0.60
    IFE
    -0.59
     Forced
    -0.57
    POSITIVE LOGITS
     minimize
    1.55
     avoid
    1.41
     ensure
    1.30
     minim
    1.24
    avoid
    1.21
     maintain
    1.21
     maximize
    1.18
     keep
    1.18
     adhere
    1.18
     emphasize
    1.15
    Act Density 0.300%

    No Known Activations