INDEX
    Explanations

    social interactions and relationships

    New Auto-Interp
    Negative Logits
    ±
    -0.17
    ÌĨ
    -0.16
    ENSE
    -0.15
    .gt
    -0.14
    ngth
    -0.14
    okrat
    -0.14
    CodeAt
    -0.13
    uiltin
    -0.13
    جة
    -0.13
    uth
    -0.13
    POSITIVE LOGITS
     yard
    0.17
     whom
    0.15
    _DAT
    0.15
     shame
    0.15
     Shame
    0.14
    essler
    0.14
    INVAL
    0.14
     Yard
    0.14
     lack
    0.13
    ávÄĽ
    0.13
    Act Density 0.125%

    No Known Activations