INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     tweede
    -0.07
    ë
    -0.07
     second
    -0.07
     asper
    -0.07
    cookie
    -0.07
     interplay
    -0.07
     দ্বিত
    -0.07
    -0.07
    ப்பட்ட
    -0.07
    POSITIVE LOGITS
    .before
    0.10
     tupu
    0.09
     kafin
    0.09
    -before
    0.09
     ngaph
    0.09
     Outside
    0.08
     sebelum
    0.08
     Beyond
    0.08
    	before
    0.08
     مخکې
    0.08
    Act Density 0.050%

    No Known Activations