INDEX
    Explanations

    mentions of adversaries or foes

    references to adversarial characters or entities

    New Auto-Interp
    Negative Logits
    lic
    -0.83
    eret
    -0.79
    otide
    -0.77
    auntlets
    -0.74
    ced
    -0.73
    cer
    -0.70
    otto
    -0.70
    oled
    -0.69
    Shot
    -0.69
    UTH
    -0.69
    POSITIVE LOGITS
     enemies
    1.24
     foe
    1.15
     enemy
    1.12
     adversaries
    1.11
     Enemies
    1.07
     foes
    1.05
     Enemy
    0.99
     undermin
    0.96
     adversary
    0.91
    emies
    0.89
    Act Density 0.010%

    No Known Activations