INDEX
    Explanations

    expressions of friendliness and supportive social interactions

    New Auto-Interp
    Negative Logits
    ors
    -0.20
    CCA
    -0.16
    ÑĨÑİ
    -0.15
    åĸľ
    -0.15
    iled
    -0.14
    sf
    -0.14
    576
    -0.14
    ساÙĨ
    -0.14
    uiltin
    -0.14
    odzi
    -0.14
    POSITIVE LOGITS
    lier
    0.24
    liness
    0.21
     confines
    0.20
    liest
    0.19
    lies
    0.18
    ships
    0.17
     disposed
    0.17
     faces
    0.17
    ness
    0.17
     enough
    0.17
    Act Density 0.023%

    No Known Activations