INDEX
    Explanations

    interactions and behaviors that emphasize respect and treatment of individuals

    New Auto-Interp
    Negative Logits
     readily
    -0.16
     easily
    -0.15
    lac
    -0.15
    lew
    -0.14
    mania
    -0.14
     hos
    -0.14
    imple
    -0.14
    483
    -0.14
    avra
    -0.14
    478
    -0.13
    POSITIVE LOGITS
     differently
    0.39
    наÑĩе
    0.26
     according
    0.22
    according
    0.21
     accordingly
    0.19
    æĮī
    0.19
     accordance
    0.18
     diffé
    0.18
    odash
    0.17
    å¾Ĺ
    0.17
    Act Density 0.295%

    No Known Activations