INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ประโย
    -0.08
     railways
    -0.08
     eig
    -0.08
     interracial
    -0.07
     bạc
    -0.07
    -0.07
    _kategori
    -0.07
     hamm
    -0.07
    سلم
    -0.07
    eiß
    -0.07
    POSITIVE LOGITS
    rior
    0.07
    0.07
    动作
    0.07
    los
    0.06
    -------↵↵
    0.06
    0.06
    owner
    0.06
    priority
    0.06
    mong
    0.06
     Added
    0.06
    Act Density 0.001%

    No Known Activations