INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     development
    -0.30
    ϶
    -0.30
    antis
    -0.28
     antis
    -0.27
    åħ³éĶ®æĺ¯
    -0.25
    contres
    -0.25
     Bib
    -0.25
    clusive
    -0.25
     Doll
    -0.24
    restricted
    -0.24
    POSITIVE LOGITS
    arti
    0.29
    åĴ¸
    0.29
    acja
    0.28
    .CommandType
    0.28
    两级
    0.27
     tÃŃn
    0.26
    ä¸ĩ亿
    0.26
    ogui
    0.25
    ä»£æĽ¿
    0.25
    ä¸īæĿ¡
    0.25
    Act Density 0.065%

    No Known Activations