INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     anlamına
    0.50
    although
    0.49
    因为它
    0.49
    Cela
    0.40
    because
    0.39
     ಏಕೆಂದರೆ
    0.39
    之类的
    0.38
     സമയം
    0.38
    meaning
    0.37
    番号
    0.37
    POSITIVE LOGITS
    :
    0.50
     onwards
    0.45
     -
    0.44
    >
    0.44
     ~
    0.39
     Ди
    0.38
    0.37
    +:
    0.36
     onward
    0.36
     回転
    0.36
    Act Density 0.017%

    No Known Activations