INDEX
    Explanations

    comparative phrases expressing resemblance

    New Auto-Interp
    Negative Logits
    alez
    -0.77
    alt
    -0.77
    inion
    -0.77
    byn
    -0.72
    utherland
    -0.72
    irtual
    -0.72
    arcity
    -0.71
    rax
    -0.71
    itles
    -0.71
    otom
    -0.71
    POSITIVE LOGITS
    lier
    1.17
    liest
    1.07
     crap
    1.03
    lihood
    1.00
     shit
    0.78
     filler
    0.72
     an
    0.70
     gib
    0.69
     a
    0.69
     something
    0.69
    Act Density 0.511%

    No Known Activations