INDEX
    Explanations

    actions and their effects in interpersonal and social contexts

    New Auto-Interp
    Negative Logits
    ibo
    -0.16
    iyel
    -0.16
    acht
    -0.15
    ickey
    -0.15
     Pazar
    -0.15
    .base
    -0.14
    onders
    -0.14
    еÑĢг
    -0.14
    hv
    -0.14
    athon
    -0.14
    POSITIVE LOGITS
     differently
    0.23
     vlastnÄĽ
    0.17
    obre
    0.17
     differs
    0.17
     вообÑīе
    0.16
     differ
    0.16
    /react
    0.16
    obra
    0.15
    iffin
    0.15
    ombres
    0.15
    Act Density 0.175%

    No Known Activations