INDEX
    Explanations

    phrases describing the actions and characteristics of certain groups or entities

    New Auto-Interp
    Negative Logits
     itself
    -0.37
     its
    -0.24
     Its
    -0.19
    Its
    -0.18
    eme
    -0.16
     à¤īसà¤ķ
    -0.16
    ara
    -0.16
    åıĬåħ¶
    -0.16
     kendini
    -0.15
     ÙĨÙ쨳Ùĩ
    -0.15
    POSITIVE LOGITS
     themselves
    0.34
     selves
    0.20
    /we
    0.18
    ÅĽmy
    0.17
     respectively
    0.17
     lượt
    0.17
    re
    0.17
    ’re
    0.16
    umber
    0.15
    atical
    0.15
    Act Density 0.377%

    No Known Activations