INDEX
    Explanations

    phrases associated with violent or conflict-related events or contexts

    words related to identities and categories

    New Auto-Interp
    Negative Logits
    enegger
    -0.84
    elf
    -0.82
    raised
    -0.79
    orate
    -0.73
    oken
    -0.71
    raising
    -0.68
    CHA
    -0.67
    irth
    -0.64
    ioned
    -0.63
     Broken
    -0.62
    POSITIVE LOGITS
    idal
    1.43
     pend
    0.76
    ãĥ³ãĤ¸
    0.69
    ãĥ¥
    0.69
    ity
    0.67
    ysis
    0.67
    atory
    0.66
    oad
    0.66
    itous
    0.65
    ITIES
    0.64
    Act Density 0.010%

    No Known Activations