INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     latin
    -0.06
     god
    -0.06
     working
    -0.06
    Strings
    -0.06
    abilir
    -0.06
     Virtual
    -0.06
     antenn
    -0.06
     bans
    -0.06
     iktidar
    -0.06
     Crushing
    -0.06
    POSITIVE LOGITS
     розвит
    0.07
     Towards
    0.07
     تجهیزات
    0.07
    0.07
    realDonaldTrump
    0.07
     Direction
    0.07
     towards
    0.06
    OW
    0.06
     retirees
    0.06
     olduğuna
    0.06
    Act Density 0.010%

    No Known Activations