INDEX
    Explanations

    terms related to oppression and systemic issues

    New Auto-Interp
    Negative Logits
    rido
    -0.08
    icari
    -0.08
    _Insert
    -0.08
    оÑĢоÑĤ
    -0.07
     eskort
    -0.07
    neider
    -0.07
    .DOM
    -0.07
    aze
    -0.07
    اصÙĦÙĩ
    -0.07
    nect
    -0.07
    POSITIVE LOGITS
     somehow
    0.10
     or
    0.06
    ÂĿ
    0.06
    776
    0.06
     blah
    0.06
     Thur
    0.06
    |array
    0.05
    <|end_of_text|>
    0.05
     unw
    0.05
    ip
    0.05
    Act Density 0.083%

    No Known Activations