INDEX
    Explanations

    comparisons using the word "like"

    New Auto-Interp
    Negative Logits
    byn
    -0.80
    conservancy
    -0.67
    earance
    -0.64
    alez
    -0.64
    arcity
    -0.64
    itions
    -0.64
    alt
    -0.63
    oust
    -0.63
    edom
    -0.63
    izarre
    -0.63
    POSITIVE LOGITS
     crap
    0.98
    lier
    0.93
     shit
    0.84
     idiots
    0.73
     fools
    0.71
    liest
    0.68
     they
    0.67
     outsiders
    0.67
    lihood
    0.67
     THEY
    0.65
    Act Density 0.032%

    No Known Activations