INDEX
    Explanations

    hunger strikes and suicides

    New Auto-Interp
    Negative Logits
    бе
    0.40
    צרים
    0.40
    essentially
    0.39
     במה
    0.38
     выход
    0.38
    🚪
    0.38
    𝗅
    0.37
    closed
    0.37
    ería
    0.37
    snp
    0.37
    POSITIVE LOGITS
     self
    0.53
     burns
    0.49
     imm
    0.47
     inm
    0.46
     Self
    0.46
     burn
    0.46
    self
    0.46
    0.44
     Selbst
    0.44
     temperatures
    0.43
    Act Density 0.004%

    No Known Activations