INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     αλλά
    -2.86
    -2.75
     namun
    -2.48
     โดย
    -2.44
    ración
    -2.42
     tetapi
    -2.41
     éditoriale
    -2.38
     και
    -2.34
    opardy
    -2.34
     mutta
    -2.33
    POSITIVE LOGITS
    3.02
     but
    2.36
    2.25
    2.25
    z
    2.17
     zwart
    2.14
    on
    2.09
    D
    2.06
     indispensable
    2.05
    of
    2.02
    Act Density 0.002%

    No Known Activations