INDEX
    Explanations

    references to race-related violence and exploitation

    New Auto-Interp
    Negative Logits
    373
    -0.14
    irts
    -0.13
    crets
    -0.13
    nst
    -0.13
    ennon
    -0.12
    uilder
    -0.12
    ublished
    -0.12
    å°ij女
    -0.12
     rains
    -0.12
    raj
    -0.12
    POSITIVE LOGITS
     expend
    0.35
     disposable
    0.32
     fodder
    0.30
     chatt
    0.30
     prey
    0.28
     pawn
    0.27
     cannon
    0.27
     targets
    0.26
     collateral
    0.26
     Disposable
    0.25
    Act Density 0.275%

    No Known Activations