INDEX
    Explanations

    phrases expressing personal opinions or comparisons

    New Auto-Interp
    Negative Logits
    oust
    -0.83
    uty
    -0.78
    uid
    -0.74
    inion
    -0.73
    itles
    -0.73
    olphin
    -0.70
    nerg
    -0.70
    OE
    -0.70
    ulic
    -0.70
    vantage
    -0.69
    POSITIVE LOGITS
    lier
    1.05
     crap
    1.02
     something
    0.94
     someone
    0.88
     somebody
    0.84
     an
    0.83
     it
    0.83
     they
    0.82
     shit
    0.82
     a
    0.82
    Act Density 0.060%

    No Known Activations