INDEX
    Explanations

    phrases indicating the need to ensure something is done

    reassuring phrases or instructions emphasizing the importance of caution and thoroughness

    New Auto-Interp
    Negative Logits
    igmatic
    -0.72
    ufact
    -0.70
    oub
    -0.69
     pione
    -0.68
    âĸ¬
    -0.66
    option
    -0.65
    obb
    -0.65
    elta
    -0.65
     derog
    -0.64
     Flavoring
    -0.64
    POSITIVE LOGITS
     they
    0.94
     everyone
    0.93
     nobody
    0.92
     everything
    0.92
     everybody
    0.90
     you
    0.90
     we
    0.86
     that
    0.83
     there
    0.80
     it
    0.73
    Act Density 0.031%

    No Known Activations