INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lapse
    -0.07
     کند
    -0.07
    ứt
    -0.07
     quarry
    -0.07
     whilst
    -0.07
     Nearly
    -0.07
     Healthy
    -0.07
     yaklaşık
    -0.07
     afternoon
    -0.07
     gew
    -0.07
    POSITIVE LOGITS
     Dom
    0.12
     Dominic
    0.10
     dom
    0.09
     Domin
    0.09
    mand
    0.08
     domin
    0.08
    Dom
    0.08
     mand
    0.07
    min
    0.07
    ми
    0.07
    Act Density 0.008%

    No Known Activations