INDEX
    Explanations

    verbs indicating influence or causation

    New Auto-Interp
    Negative Logits
     Yourself
    -0.22
     yourselves
    -0.18
     svůj
    -0.16
     yourself
    -0.16
    dued
    -0.16
    imers
    -0.15
    pid
    -0.15
    à¸Ĭม
    -0.14
    ัà¸ļม
    -0.14
    odata
    -0.14
    POSITIVE LOGITS
     us
    0.28
     them
    0.20
     him
    0.18
    ä¸įäºĨ
    0.17
    oire
    0.15
    -enable
    0.15
     me
    0.15
     you
    0.15
    .bundle
    0.15
     itself
    0.14
    Act Density 0.519%

    No Known Activations