INDEX
    Explanations

    references to colonialism and related concepts

    New Auto-Interp
    Negative Logits
    udy
    -0.17
    ouver
    -0.16
    aly
    -0.15
    repid
    -0.14
    rias
    -0.14
    reds
    -0.14
    óm
    -0.14
    lik
    -0.14
    conomy
    -0.14
     Elect
    -0.14
    POSITIVE LOGITS
    inch
    0.17
     TMPro
    0.16
    خاÙĨÙĩ
    0.14
    -era
    0.14
    tü
    0.14
    ors
    0.14
    cratch
    0.14
     Farrell
    0.13
    ìĭ¬
    0.13
    ãģĤãĤĬ
    0.13
    Act Density 0.030%

    No Known Activations