INDEX
    Explanations

    references to adversarial entities or opponents

    New Auto-Interp
    Negative Logits
    pr
    -0.59
     Norr
    -0.57
     decía
    -0.56
    Against
    -0.56
    er
    -0.55
    Pr
    -0.54
    R
    -0.52
    At
    -0.52
    S
    -0.51
    est
    -0.51
    POSITIVE LOGITS
     Enemy
    1.36
    Enemy
    1.16
     Enemies
    1.16
     enemy
    1.15
    enemy
    1.15
    enemies
    1.15
     enemies
    1.12
    Enemies
    1.07
    ennemi
    1.05
    nemy
    1.00
    Act Density 0.006%

    No Known Activations