INDEX
    Explanations

    words related to influence, responsibility, and social relationships

    New Auto-Interp
    Negative Logits
    utin
    -0.20
    à¹Ģหล
    -0.16
    subtype
    -0.15
    zung
    -0.15
    pitch
    -0.15
    repid
    -0.15
    andas
    -0.14
     blinds
    -0.14
     AJ
    -0.14
    RICT
    -0.14
    POSITIVE LOGITS
    271
    0.15
    .CommandType
    0.14
     Copp
    0.14
     Hakk
    0.14
     UNU
    0.14
    رÛĮÙģ
    0.13
     süt
    0.13
    iê
    0.13
    ä»¶
    0.13
    938
    0.13
    Act Density 0.007%

    No Known Activations