INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Income
    -0.07
     Spare
    -0.06
     Maker
    -0.06
     WORD
    -0.06
     earn
    -0.06
     Capability
    -0.06
     chemistry
    -0.06
    inition
    -0.06
    	elem
    -0.06
     clinic
    -0.06
    POSITIVE LOGITS
     protest
    0.11
     protests
    0.08
    0.08
     Protest
    0.08
     meget
    0.07
    مش
    0.07
     }>↵
    0.07
    0.07
        ↵    ↵    ↵    ↵
    0.07
     floods
    0.07
    Act Density 0.004%

    No Known Activations