INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Removal
    -0.09
    Hard
    -0.08
    Original
    -0.07
    Atomic
    -0.07
     removal
    -0.07
    Transactional
    -0.07
    Insertion
    -0.07
    Bool
    -0.07
    .Insert
    -0.07
    Removed
    -0.07
    POSITIVE LOGITS
     Farma
    0.08
     établ
    0.08
     governmental
    0.08
     ita
    0.08
     presidents
    0.08
     Mog
    0.08
    -chief
    0.08
     prezident
    0.08
    大师
    0.08
     Centre
    0.08
    Act Density 0.001%

    No Known Activations