INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     SINGLE
    -0.10
     Ones
    -0.10
    -three
    -0.09
    818
    -0.08
    ONES
    -0.08
    egin
    -0.08
    åįĺ
    -0.08
    ëģĶ
    -0.08
    urus
    -0.08
    ูà¸Ļ
    -0.08
    POSITIVE LOGITS
     one
    0.71
     satu
    0.42
     одного
    0.36
    one
    0.34
     jedné
    0.33
     eines
    0.33
     jednoho
    0.33
     íķĺëĤĺ
    0.32
     одна
    0.31
     одной
    0.30
    Act Density 0.181%

    No Known Activations