INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    νή
    0.54
    ნის
    0.50
    و
    0.50
     invita
    0.48
    0.47
    ΝΑ
    0.47
     segundos
    0.46
    원을
    0.46
     свое
    0.46
    نيا
    0.46
    POSITIVE LOGITS
     cat
    0.47
     הצ
    0.46
     ser
    0.45
     l
    0.43
     C
    0.42
     tr
    0.42
     cinder
    0.41
     dungeon
    0.41
    s
    0.41
    lev
    0.41
    Act Density 0.046%

    No Known Activations