INDEX
    Explanations

    possessive forms of words

    New Auto-Interp
    Negative Logits
    ’s
    -0.25
     latter
    -0.20
     (“
    -0.18
    ‘s
    -0.17
    ä¸ĢäºĽ
    -0.17
    /or
    -0.16
    æĥħåĨµ
    -0.16
    å£°éŁ³
    -0.16
    ’m
    -0.15
    -ed
    -0.15
    POSITIVE LOGITS
     been
    0.30
     got
    0.25
     gonna
    0.24
     not
    0.24
     gotta
    0.22
    been
    0.20
     Been
    0.20
    ÂĿ
    0.20
     BEEN
    0.20
    /'
    0.19
    Act Density 0.300%

    No Known Activations