INDEX
    Explanations

    phrases indicating instructions or recommendations

    instructions or calls to action

    New Auto-Interp
    Negative Logits
     sadd
    -0.69
     reneg
    -0.68
     presided
    -0.68
     upsetting
    -0.66
     reunited
    -0.66
     supposedly
    -0.66
     certainly
    -0.65
     indeed
    -0.65
     reunion
    -0.65
     quo
    -0.64
    POSITIVE LOGITS
     Use
    3.31
    Use
    2.54
     Uses
    2.20
     USE
    2.07
    use
    1.81
     Usage
    1.76
     use
    1.73
     Using
    1.69
     Used
    1.54
    Using
    1.43
    Act Density 0.012%

    No Known Activations