INDEX
    Explanations

    references to individuals or groups identified as victims

    New Auto-Interp
    Negative Logits
    ings
    -0.18
    erie
    -0.17
    enta
    -0.16
    egot
    -0.15
    oes
    -0.15
    mates
    -0.15
    ØŃÙĬ
    -0.15
    bons
    -0.15
    é¢ĺ
    -0.15
    ROS
    -0.15
    POSITIVE LOGITS
    hood
    0.25
    ized
    0.25
    /target
    0.20
    atically
    0.19
    ively
    0.19
    izers
    0.19
    ology
    0.19
    ization
    0.17
    IZED
    0.17
    ised
    0.16
    Act Density 0.024%

    No Known Activations