INDEX
    Explanations

    phrases indicating examples or instances

    New Auto-Interp
    Negative Logits
    ibold
    -0.17
    ë§Ŀ
    -0.15
    IP
    -0.15
    ohon
    -0.15
    aina
    -0.15
    ohana
    -0.15
    ohan
    -0.14
    Äį
    -0.13
    STALL
    -0.13
    itness
    -0.13
    POSITIVE LOGITS
     those
    0.69
    those
    0.63
     Those
    0.59
    Those
    0.57
    éĤ£äºĽ
    0.46
    éĤ£ç§į
    0.45
     ceux
    0.34
    éĤ£ä¸ª
    0.34
     تÙĦÙĥ
    0.32
     celui
    0.30
    Act Density 0.167%

    No Known Activations