INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bernie
    -0.07
    '>↵
    -0.06
     مكان
    -0.06
     Ссылки
    -0.06
     flagged
    -0.06
    ̉
    -0.06
     ucwords
    -0.06
     naï
    -0.06
     Destiny
    -0.06
    -0.06
    POSITIVE LOGITS
     Emotional
    0.07
    POCH
    0.06
    much
    0.06
    bron
    0.06
    0.06
    	               
    0.06
    _RE
    0.06
     competitive
    0.06
    çek
    0.06
    upal
    0.06
    Act Density 0.030%

    No Known Activations