INDEX
    Explanations

    violence and war

    New Auto-Interp
    Negative Logits
    ailed
    -0.08
    -impact
    -0.08
    _aff
    -0.08
    -0.08
    就在
    -0.08
     trajectory
    -0.08
     atrações
    -0.07
     واقع
    -0.07
    =value
    -0.07
     Affect
    -0.07
    POSITIVE LOGITS
    wirtschaft
    0.08
    ಿಗೂ
    0.07
    ေတာ့
    0.07
     խորհուրդ
    0.07
     vigilant
    0.07
    0.07
    оруж
    0.07
     raging
    0.07
     ós
    0.07
     المسلحة
    0.07
    Act Density 0.021%

    No Known Activations