INDEX
    Explanations

    interpersonal relationships and social interactions

    New Auto-Interp
    Negative Logits
    imers
    -0.17
    zos
    -0.16
    ams
    -0.15
    hear
    -0.14
    otts
    -0.14
    ingham
    -0.14
    igo
    -0.14
    AMS
    -0.14
     Daw
    -0.14
    lucent
    -0.13
    POSITIVE LOGITS
     whether
    0.20
     questions
    0.19
    whether
    0.18
     Whether
    0.17
     why
    0.17
    礼
    0.16
    æĺ¯åIJ¦
    0.16
     permission
    0.16
    ade
    0.15
    Whether
    0.15
    Act Density 0.099%

    No Known Activations