INDEX
    Explanations

    mentions of physical violence or aggression, particularly related to the concept of a "dog-eat-dog" competition

    New Auto-Interp
    Negative Logits
    éĹĺ
    -0.87
    DERR
    -0.85
     Edison
    -0.81
    esson
    -0.78
    artz
    -0.78
    oulos
    -0.74
    velength
    -0.70
    farious
    -0.68
     WARN
    -0.67
    ORN
    -0.67
    POSITIVE LOGITS
    gie
    1.11
    patch
    1.10
     barking
    1.06
    fighting
    1.03
    fight
    1.01
    meat
    1.00
    matic
    0.97
    fights
    0.95
    matically
    0.94
     catcher
    0.94
    Act Density 0.032%

    No Known Activations