INDEX
    Explanations

    no + thing being excluded

    New Auto-Interp
    Negative Logits
    k
    2.13
    te
    1.98
    0
    1.95
    ming
    1.86
    sa
    1.80
    me
    1.80
    1.79
    i
    1.77
     endeav
    1.75
     socialize
    1.71
    POSITIVE LOGITS
     obstante
    2.28
     entanto
    2.03
    但是在
    1.95
    一丝
    1.94
    ور
    1.88
    ב
    1.87
    1.87
     불구하고
    1.73
     tuttavia
    1.70
    1.70
    Act Density 0.182%

    No Known Activations