INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     offended
    -0.07
    eygamber
    -0.07
     auction
    -0.07
    ANCED
    -0.06
     světě
    -0.06
     اکتبر
    -0.06
     высокой
    -0.06
    _RANGE
    -0.06
    .native
    -0.06
     cheek
    -0.06
    POSITIVE LOGITS
    σπ
    0.07
    Ga
    0.07
    Hospital
    0.06
     nd
    0.06
    station
    0.06
     Ga
    0.06
    0.06
     colspan
    0.06
     Hispanic
    0.06
    _tF
    0.06
    Act Density 0.055%

    No Known Activations