INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ynes
    -0.07
     lawmaker
    -0.07
    jack
    -0.07
    ürk
    -0.06
     Kız
    -0.06
     compound
    -0.06
     landmark
    -0.06
     κό
    -0.06
    ¬
    -0.06
    conte
    -0.06
    POSITIVE LOGITS
     do
    0.08
     doing
    0.07
     did
    0.07
     procedures
    0.06
     does
    0.06
     لع
    0.06
     داو
    0.06
    /js
    0.06
    Defs
    0.06
    /person
    0.06
    Act Density 0.063%

    No Known Activations