INDEX
    Explanations

    words that indicate actions or directions

    New Auto-Interp
    Negative Logits
    ãģ¤ãģij
    -0.17
    è¦ĭ
    -0.17
    á»ħ
    -0.15
    midt
    -0.15
    ened
    -0.15
    stead
    -0.14
     é©
    -0.14
    cq
    -0.14
    ED
    -0.14
    erd
    -0.14
    POSITIVE LOGITS
    /from
    0.34
    gether
    0.32
    plevel
    0.21
    asts
    0.20
    ledo
    0.19
     be
    0.19
    wner
    0.18
    ogle
    0.18
    asting
    0.18
    xic
    0.18
    Act Density 0.805%

    No Known Activations