INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     findings
    -0.07
     certainty
    -0.06
    .exam
    -0.06
    اران
    -0.06
    been
    -0.06
     refr
    -0.06
    too
    -0.06
     hepat
    -0.06
    coil
    -0.06
    Pear
    -0.06
    POSITIVE LOGITS
    яг
    0.07
    طاق
    0.06
     cakes
    0.06
     `%
    0.06
     fw
    0.06
     pathetic
    0.06
     comply
    0.06
    ือก
    0.06
     я
    0.06
    seek
    0.06
    Act Density 0.020%

    No Known Activations