INDEX
    Explanations

    themes related to social justice and equality

    New Auto-Interp
    Negative Logits
    552
    -0.14
    atcher
    -0.14
    pedia
    -0.14
    ziel
    -0.12
     Instructions
    -0.12
    äºľ
    -0.12
    οÏħÏĤ
    -0.12
     sis
    -0.12
     pairs
    -0.12
    ŀæĢ§
    -0.12
    POSITIVE LOGITS
     these
    0.76
    è¿ĻäºĽ
    0.66
    these
    0.65
     These
    0.59
    These
    0.58
     THESE
    0.57
     ÑįÑĤиÑħ
    0.45
     tÄĽchto
    0.43
     estos
    0.41
     bunlar
    0.41
    Act Density 0.992%

    No Known Activations