INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ্বাস
    0.33
     ||
    0.33
     antaranya
    0.32
    0.31
    ři
    0.31
    &&\
    0.30
     Depends
    0.30
     Ren
    0.30
    ":{"
    0.30
     &&
    0.29
    POSITIVE LOGITS
     instead
    3.38
     rather
    3.31
    而不是
    3.17
    instead
    3.06
    rather
    2.97
    而非
    2.86
     вместо
    2.83
     plutôt
    2.77
    Instead
    2.70
    Rather
    2.70
    Act Density 0.170%

    No Known Activations