INDEX
    Explanations

    verbs indicating achievement or success

    New Auto-Interp
    Negative Logits
     Yourself
    -0.19
    .FontStyle
    -0.18
    /REC
    -0.18
     yourselves
    -0.17
     yourself
    -0.17
     unp
    -0.15
     ourselves
    -0.15
     svůj
    -0.14
     oneself
    -0.14
    pid
    -0.14
    POSITIVE LOGITS
     us
    0.22
    ä¸įäºĨ
    0.16
     him
    0.15
     them
    0.15
    isas
    0.14
     McGr
    0.14
     bä
    0.14
    è£ķ
    0.14
    angles
    0.14
    alog
    0.14
    Act Density 0.311%

    No Known Activations