INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Head
    -0.08
    (permission
    -0.07
     בחיים
    -0.07
    xca
    -0.07
    [_
    -0.07
    Hop
    -0.07
     exposures
    -0.06
     computed
    -0.06
    Sure
    -0.06
     gamma
    -0.06
    POSITIVE LOGITS
     Violence
    0.08
    lob
    0.08
    ilingual
    0.07
    type
    0.07
     PLAYER
    0.07
     tirelessly
    0.07
     địa
    0.07
    ласт
    0.07
    丝绸
    0.07
    bol
    0.07
    Act Density 0.004%

    No Known Activations