INDEX
    Explanations

    references to specific companies or organizations as well as words related to negative social behaviors

    references to companies and antisemitism

    New Auto-Interp
    Negative Logits
     Nile
    -0.66
    åĮ
    -0.66
    tails
    -0.65
    isen
    -0.63
    eering
    -0.61
    ogy
    -0.60
     heads
    -0.59
     Worlds
    -0.59
    itarian
    -0.58
    arty
    -0.58
    POSITIVE LOGITS
    aurus
    1.25
    creen
    1.23
    ystem
    1.20
    earch
    1.17
    erver
    1.16
    ocial
    1.12
    ullivan
    1.10
    hiba
    1.09
    cript
    1.05
    omething
    1.04
    Act Density 0.057%

    No Known Activations