INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     discredit
    -0.09
     outspoken
    -0.09
     backlash
    -0.08
    enek
    -0.08
    çģ½
    -0.08
    apon
    -0.08
    kke
    -0.08
    utter
    -0.08
     fingert
    -0.08
     alliances
    -0.08
    POSITIVE LOGITS
     dispute
    0.31
     disputes
    0.30
     differences
    0.27
    äºī
    0.24
     conflict
    0.24
     conflicts
    0.21
     Differences
    0.21
     issues
    0.20
     tranh
    0.20
     difference
    0.19
    Act Density 0.092%

    No Known Activations