INDEX
    Explanations

    expressions of emotional support and familial relationships

    New Auto-Interp
    Negative Logits
     him
    -0.17
     herself
    -0.17
    à¹Ģà¸Ńà¸ĩ
    -0.16
     oneself
    -0.16
    him
    -0.15
     insanlar
    -0.14
    iy
    -0.14
    iry
    -0.14
     Him
    -0.14
    šku
    -0.14
    POSITIVE LOGITS
     her
    0.38
     their
    0.32
     Her
    0.30
     HER
    0.30
     deren
    0.28
    Her
    0.25
     Their
    0.24
     ihr
    0.23
    her
    0.23
    their
    0.23
    Act Density 0.336%

    No Known Activations