INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Orang
    0.35
    pel
    0.34
     Ben
    0.33
     كار
    0.33
     Unt
    0.32
    atsch
    0.32
     У
    0.31
     Бы
    0.31
     U
    0.31
    0.31
    POSITIVE LOGITS
     failures
    0.39
     militari
    0.37
    0.37
     occupiers
    0.36
     faltando
    0.35
     cláus
    0.34
     causado
    0.34
     haloes
    0.34
     femmes
    0.34
     viên
    0.33
    Act Density 0.003%

    No Known Activations