INDEX
    Explanations

    phrases related to personal identity and social connections

    New Auto-Interp
    Negative Logits
     are
    -0.19
    æĺ¯åľ¨
    -0.16
    lerdir
    -0.16
     is
    -0.16
    veis
    -0.16
    ãģ¯
    -0.16
    æĺ¯æĪij
    -0.15
    جÙĩ
    -0.15
     adalah
    -0.14
    ijken
    -0.14
    POSITIVE LOGITS
    AtPath
    0.17
     been
    0.16
    been
    0.16
    Been
    0.15
     été
    0.15
    367
    0.14
    mite
    0.14
    loon
    0.14
    ãĥĵãĥ¼
    0.14
    186
    0.13
    Act Density 0.012%

    No Known Activations