INDEX
    Explanations

    phrases related to physical violence or aggression involving beating

    repeated references to the word "beat," often in contexts of violence or competition

    New Auto-Interp
    Negative Logits
    orrow
    -0.76
    arij
    -0.70
     mosqu
    -0.68
    ortium
    -0.68
    ateral
    -0.66
     behavi
    -0.65
    agher
    -0.64
    ffe
    -0.64
    OPLE
    -0.64
    isal
    -0.63
    POSITIVE LOGITS
    beat
    1.22
    boxing
    1.03
    nik
    0.97
    tle
    0.94
    down
    0.93
    Beat
    0.90
    sticks
    0.88
    ework
    0.87
    rice
    0.84
     beat
    0.82
    Act Density 0.013%

    No Known Activations