INDEX
    Explanations

    instances of the word "attack" and its variations

    New Auto-Interp
    Negative Logits
    {~
    -0.74
     much
    -0.61
     beliebt
    -0.61
     whole
    -0.61
    Ав
    -0.61
     gu
    -0.61
    RUNTIME
    -0.60
     ύ
    -0.57
    गु
    -0.56
     թվական
    -0.55
    POSITIVE LOGITS
     Attack
    1.75
     ATTACK
    1.65
     attack
    1.64
     attacks
    1.59
     Attacks
    1.57
    ATTACK
    1.56
    attack
    1.55
    Attacks
    1.53
    attacks
    1.48
    Attack
    1.44
    Act Density 0.072%

    No Known Activations