INDEX
    Explanations

    contractions and possessive forms in the text

    New Auto-Interp
    Negative Logits
    dana
    -0.17
    åľ°æĸ¹
    -0.16
    å£°éŁ³
    -0.16
    æīĭ
    -0.16
    ’n
    -0.16
    æĥħåĨµ
    -0.15
    shaw
    -0.15
    auss
    -0.14
    acons
    -0.14
    ãĥ¬ãĤ¤
    -0.14
    POSITIVE LOGITS
    ezier
    0.18
    richt
    0.17
    ullivan
    0.17
    outh
    0.15
    εÏĦ
    0.14
    ed
    0.14
    baum
    0.14
    ãĥ«ãĥī
    0.14
    evin
    0.14
    phalt
    0.13
    Act Density 0.028%

    No Known Activations