INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    C
    0.94
    M
    0.94
    T
    0.90
    R
    0.90
    selves
    0.87
     dàng
    0.86
    B
    0.86
    Cpu
    0.84
    D
    0.84
    Radius
    0.82
    POSITIVE LOGITS
    та
    1.32
    𝚑
    1.09
    𝚜
    1.05
    𝚞
    1.05
    ig
    1.02
    𝘱
    1.02
    𝘰
    1.00
    ung
    1.00
    𝘣
    1.00
    很多
    0.99
    Act Density 0.248%

    No Known Activations