INDEX
    Explanations

    phrases indicating a range or selection of options

    New Auto-Interp
    Negative Logits
     anderen
    -0.16
    åı¦å¤ĸ
    -0.15
     åħ¶ä»ĸ
    -0.15
     OTHER
    -0.15
    nier
    -0.15
     altri
    -0.14
    other
    -0.14
    åı¦ä¸Ģ
    -0.14
    ">ÃĹ</
    -0.14
    OTHER
    -0.14
    POSITIVE LOGITS
     simple
    0.30
     humble
    0.26
     smallest
    0.25
    simple
    0.24
     simples
    0.24
    ç®Ģåįķ
    0.23
     simplest
    0.23
     small
    0.22
     basic
    0.21
     inception
    0.21
    Act Density 0.070%

    No Known Activations