INDEX
    Explanations

    words related to controversial political figures and actions

    references to torture and related historical figures

    New Auto-Interp
    Negative Logits
    Ü
    -0.93
    âĸ¬
    -0.85
    ä
    -0.82
    à
    -0.80
    liest
    -0.78
    UAL
    -0.77
    ãĤ·
    -0.73
    Minecraft
    -0.72
    RGB
    -0.71
    ãĥī
    -0.71
    POSITIVE LOGITS
     Bolton
    0.84
     Tort
    0.82
    ramid
    0.81
    ongyang
    0.81
    ombo
    0.78
    kefeller
    0.77
    artisan
    0.77
    terness
    0.76
    odies
    0.75
    oreal
    0.75
    Act Density 0.023%

    No Known Activations