INDEX
    Explanations

    phrases related to personal identity and self-perception

    New Auto-Interp
    Negative Logits
    257
    -0.15
    輪
    -0.14
    435
    -0.14
    echa
    -0.13
    çĦ¡ãģĹãģ
    -0.13
    rve
    -0.13
    olith
    -0.13
     Ðļод
    -0.13
    /rss
    -0.13
    ãģªãģĮãĤī
    -0.13
    POSITIVE LOGITS
     behaves
    0.19
     handled
    0.18
     behaved
    0.18
     behave
    0.17
     handles
    0.17
     обÑģÑĤ
    0.17
     Handles
    0.17
     вÑĭглÑıд
    0.17
    æī±
    0.16
     behand
    0.16
    Act Density 0.177%

    No Known Activations