INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pagan
    -0.07
     gypsum
    -0.07
    岛上
    -0.07
     Sullivan
    -0.07
    توقيع
    -0.07
     постоян
    -0.06
     namoro
    -0.06
     Instrument
    -0.06
     prayer
    -0.06
    estinal
    -0.06
    POSITIVE LOGITS
    0.07
    领军
    0.06
     Prof
    0.06
    0.06
     tts
    0.06
     cx
    0.06
    0.06
     [][]
    0.06
     Rolled
    0.06
    把控
    0.06
    Act Density 0.284%

    No Known Activations