INDEX
    Explanations

    sentences containing the word "unique"

    New Auto-Interp
    Negative Logits
     Worse
    -0.83
     Concern
    -0.70
    UGH
    -0.67
    intel
    -0.65
    shit
    -0.64
    worn
    -0.64
    Wr
    -0.63
    OH
    -0.62
    idia
    -0.62
    lest
    -0.60
    POSITIVE LOGITS
     simplicity
    1.11
     versatility
    1.07
     flexibility
    0.89
     inexpensive
    0.88
     combines
    0.87
     allows
    0.85
     streamlined
    0.84
     avoids
    0.83
     unlike
    0.83
     seamlessly
    0.80
    Act Density 0.611%

    No Known Activations