INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     stems
    -0.07
     Anth
    -0.06
    داری
    -0.06
    :Object
    -0.06
    ゲーム
    -0.06
    ColumnsMode
    -0.06
    leg
    -0.06
     Роб
    -0.06
     Suicide
    -0.06
    -0.06
    POSITIVE LOGITS
     debunk
    0.07
     historically
    0.06
     Oxygen
    0.06
    Who
    0.06
    ouples
    0.06
     oui
    0.06
    Yes
    0.06
    0.06
    	EXPECT
    0.06
     exclaimed
    0.06
    Act Density 0.044%

    No Known Activations