INDEX
    Explanations

    references to the concept of "from," indicating a focus on origins or sources

    New Auto-Interp
    Negative Logits
    apons
    -0.15
    roe
    -0.14
    idan
    -0.14
    aya
    -0.14
    lub
    -0.14
    ربÙĩ
    -0.14
    /from
    -0.14
    ramer
    -0.13
    tero
    -0.13
    ê³
    -0.13
    POSITIVE LOGITS
    scratch
    0.19
    دÙĪØ§Ø¬
    0.18
    ĥģ
    0.18
     scratch
    0.17
    ãĥ¼ãĤ¹
    0.16
    éo
    0.16
    atatype
    0.16
    Pers
    0.15
    ån
    0.15
    stash
    0.15
    Act Density 0.129%

    No Known Activations