INDEX
    Explanations

    themes related to social inequality and class exploitation

    New Auto-Interp
    Negative Logits
    éĢĶ
    -0.17
    lds
    -0.16
    ichert
    -0.16
    pery
    -0.16
    zyst
    -0.15
    rons
    -0.15
    .recycle
    -0.15
     ÑĤемпеÑĢаÑĤÑĥÑĢа
    -0.14
    ture
    -0.14
    997
    -0.14
    POSITIVE LOGITS
    è§Ĵ
    0.19
     hol
    0.17
     Watkins
    0.17
    masters
    0.15
    eros
    0.15
    ÑĢÑĮ
    0.14
     Hol
    0.14
    urv
    0.14
     Dra
    0.14
    inya
    0.14
    Act Density 0.333%

    No Known Activations