INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Cette
    -0.08
     signals
    -0.07
    علی
    -0.06
     blister
    -0.06
    -date
    -0.06
     انتقال
    -0.06
     hitting
    -0.06
    Ho
    -0.06
     currents
    -0.06
     fucking
    -0.06
    POSITIVE LOGITS
     determines
    0.11
    َان
    0.07
     Determines
    0.07
    0.07
    anal
    0.06
    realDonaldTrump
    0.06
     göç
    0.06
     determined
    0.06
    ("(%
    0.06
     انسانی
    0.06
    Act Density 0.018%

    No Known Activations