INDEX
    Explanations

    introducing an alternative

    New Auto-Interp
    Negative Logits
    有無
    0.73
     /,
    0.70
    មិន
    0.69
    不易
    0.68
    0.68
     Jangan
    0.67
    没有什么
    0.66
     không
    0.66
    মান
    0.66
    0.66
    POSITIVE LOGITS
     instead
    3.08
     Instead
    2.74
    Instead
    2.69
    instead
    2.40
     বরং
    2.07
     invece
    1.93
    而是
    1.88
     вместо
    1.82
    代わりに
    1.77
     rather
    1.72
    Act Density 0.756%

    No Known Activations