INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     mano
    -0.08
     ratings
    -0.07
     PYTHON
    -0.07
     manager
    -0.07
     Yas
    -0.07
     pay
    -0.07
     PARAMETERS
    -0.07
    _outer
    -0.06
     Newspaper
    -0.06
    —an
    -0.06
    POSITIVE LOGITS
     conflict
    0.17
     Conflict
    0.14
    Conflict
    0.12
     conflicts
    0.11
     conflic
    0.08
    lict
    0.08
    licts
    0.07
     rift
    0.07
     fraudulent
    0.07
    UILT
    0.07
    Act Density 0.008%

    No Known Activations