INDEX
    Explanations

    phrases related to social justice and activism

    instances of the end-of-text token

    New Auto-Interp
    Negative Logits
     hindsight
    -0.94
     quir
    -0.90
     glitch
    -0.80
     fuzz
    -0.79
     quirks
    -0.78
     glitches
    -0.78
    abase
    -0.77
     accidentally
    -0.75
     detecting
    -0.74
     clust
    -0.74
    POSITIVE LOGITS
     Amen
    1.25
    Peace
    1.19
     Quran
    1.08
    peace
    0.99
    Therefore
    0.98
    å¿
    0.92
    ðŁ
    0.89
     Peace
    0.89
    Pope
    0.89
    âĢķ
    0.88
    Act Density 0.477%

    No Known Activations