INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    jenige
    -0.08
    curity
    -0.08
     WAL
    -0.08
    expiration
    -0.07
     landed
    -0.07
     tilbyder
    -0.07
    ورس
    -0.07
    -0.07
    -bodied
    -0.07
     gat
    -0.07
    POSITIVE LOGITS
    0.08
     अलावा
    0.08
     tricky
    0.08
     Dij
    0.07
    0.07
     হলে
    0.07
    0.07
     Espero
    0.07
     systematic
    0.07
     সুবিধ
    0.07
    Act Density 0.034%

    No Known Activations