INDEX
    Explanations

    phrases emphasizing the characteristics or descriptions of various subjects

    New Auto-Interp
    Negative Logits
     itself
    -0.41
    çļĦä¸Ģ个
    -0.23
    å®ĥ
    -0.20
    æĺ¯ä¸Ģ个
    -0.20
    æĺ¯ä¸ª
    -0.19
     its
    -0.18
    ä¸Ģ个
    -0.18
     коÑĤоÑĢое
    -0.18
    ä¸Ģ个人
    -0.17
    ä¸ĢåĢĭ
    -0.17
    POSITIVE LOGITS
     themselves
    0.50
     ones
    0.31
    äºĽ
    0.27
     thems
    0.23
     are
    0.23
    nt
    0.22
     những
    0.22
     those
    0.22
    ones
    0.21
     mga
    0.21
    Act Density 0.743%

    No Known Activations