INDEX
    Explanations

    statements related to personal or professional announcements and experiences

    New Auto-Interp
    Negative Logits
    His
    -0.27
     Himself
    -0.26
     His
    -0.22
     Him
    -0.20
     oneself
    -0.19
     Ø¥ÙĦÙĬÙĩ
    -0.17
     Jeho
    -0.16
     ÐĻого
    -0.16
     Ðķго
    -0.16
    ä»ĸçļĦ
    -0.15
    POSITIVE LOGITS
    he
    0.50
     he
    0.45
     HE
    0.35
     hed
    0.34
     hc
    0.32
     h
    0.30
    _he
    0.30
     он
    0.29
    HE
    0.29
     hes
    0.29
    Act Density 0.423%

    No Known Activations