INDEX
    Explanations

    references to actions or intentions involving "to" followed by verbs

    New Auto-Interp
    Negative Logits
    ãģ¤ãģij
    -0.18
    éĸĭ
    -0.16
    è¦ĭ
    -0.15
    soever
    -0.15
    oppers
    -0.14
    cq
    -0.14
    erialize
    -0.14
    zelf
    -0.14
    oulder
    -0.14
    apesh
    -0.14
    POSITIVE LOGITS
    /from
    0.32
    gether
    0.31
    plevel
    0.23
    ying
    0.21
    tes
    0.20
     lỼn
    0.20
    tem
    0.20
    xic
    0.19
    ogle
    0.19
    asting
    0.19
    Act Density 1.691%

    No Known Activations