INDEX
    Explanations

    behaviors and actions, particularly in social contexts

    New Auto-Interp
    Negative Logits
     Easily
    -0.18
     easily
    -0.18
    erif
    -0.16
    imple
    -0.15
    ød
    -0.14
    bia
    -0.14
    gon
    -0.14
    onte
    -0.14
    aint
    -0.14
    pts
    -0.14
    POSITIVE LOGITS
     differently
    0.31
     like
    0.30
     according
    0.22
    _like
    0.21
     Like
    0.21
     LIKE
    0.20
    наÑĩе
    0.20
     contrary
    0.20
    Like
    0.20
     manner
    0.19
    Act Density 0.050%

    No Known Activations