INDEX
    Explanations

    self-defense

    New Auto-Interp
    Negative Logits
     лип
    -0.08
    Adv
    -0.08
     технология
    -0.08
    ίνουν
    -0.08
     Neue
    -0.08
    _adv
    -0.08
     shuffled
    -0.08
     ust
    -0.07
    ज़
    -0.07
     Shuffle
    -0.07
    POSITIVE LOGITS
     हत्या
    0.10
     оправ
    0.09
    performed
    0.09
     salute
    0.09
     performed
    0.09
     retali
    0.09
     justified
    0.09
     retaliation
    0.09
     defensive
    0.09
     morally
    0.09
    Act Density 0.029%

    No Known Activations