INDEX
    Explanations

    identifying 'a type of' classifications

    New Auto-Interp
    Negative Logits
    ä¸Ģç§į
    -0.13
     types
    -0.13
    ä¸ĢåĢĭ
    -0.12
     kinds
    -0.12
    ä¸ĢäºĽ
    -0.11
    ãĤĪãģĨãģª
    -0.11
    ä¸ĢçĤ¹
    -0.11
    ä¸ĢåĪĩ
    -0.11
    wcs
    -0.11
     ÑĤипа
    -0.11
    POSITIVE LOGITS
    face
    0.10
    orm
    0.10
    etting
    0.10
    ichi
    0.10
    ead
    0.10
    ...
    0.09
     thing
    0.09
    ew
    0.09
     carta
    0.09
    æħĭ
    0.09
    Act Density 0.053%

    No Known Activations