INDEX
    Explanations

    instances of the word "phone."

    New Auto-Interp
    Negative Logits
    acus
    -0.17
    atta
    -0.16
    inho
    -0.15
    idente
    -0.15
    .kr
    -0.15
    okud
    -0.15
    CJK
    -0.14
    otta
    -0.14
    acin
    -0.14
    å¥ī
    -0.14
    POSITIVE LOGITS
    ubit
    0.14
    INGTON
    0.14
    arius
    0.14
    /he
    0.14
     Skin
    0.14
     exit
    0.14
    ecies
    0.13
    loo
    0.13
    984
    0.13
    /t
    0.13
    Act Density 0.014%

    No Known Activations