INDEX
    Explanations

    Political/social topics

    New Auto-Interp
    Negative Logits
     Escape
    -0.07
    	protected
    -0.07
    		         
    -0.07
     ",",
    -0.07
    _gender
    -0.07
    	                       
    -0.07
     bắt
    -0.07
     visto
    -0.06
    Atlantic
    -0.06
    Purpose
    -0.06
    POSITIVE LOGITS
     impe
    0.07
     مقر
    0.07
     prům
    0.07
     charismatic
    0.06
     newX
    0.06
    0.06
     SOCIAL
    0.06
    .Free
    0.06
    warehouse
    0.06
    مق
    0.06
    Act Density 0.022%

    No Known Activations