INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ショ
    -0.07
     fila
    -0.06
     loa
    -0.06
    dirty
    -0.06
     mug
    -0.06
    νώ
    -0.06
    цями
    -0.06
     dirty
    -0.06
    bare
    -0.06
    ("+
    -0.06
    POSITIVE LOGITS
     parc
    0.07
    assessment
    0.06
     heard
    0.06
     Reddit
    0.06
     crackdown
    0.06
    ْع
    0.06
     черв
    0.06
    Founder
    0.06
    /trans
    0.06
    	DECLARE
    0.06
    Act Density 0.056%

    No Known Activations