INDEX
    Explanations

    tokens that represent identifiers or potential data points

    New Auto-Interp
    Negative Logits
     Stew
    -0.80
     disg
    -0.72
     wolves
    -0.68
     seiz
    -0.67
     eleph
    -0.67
     Mall
    -0.67
     microw
    -0.66
     Insp
    -0.66
    133
    -0.66
     Sally
    -0.64
    POSITIVE LOGITS
    b
    1.19
    obar
    1.00
     Bib
    0.97
    bis
    0.97
    bs
    0.95
    bish
    0.95
    bar
    0.94
    bin
    0.92
    B
    0.92
    baugh
    0.91
    Act Density 0.178%

    No Known Activations