INDEX
    Explanations

    phrases related to actions or beliefs concerning society and politics

    concepts related to power dynamics and responsibility

    New Auto-Interp
    Negative Logits
     agre
    -0.61
    ".
    -0.56
    '.
    -0.55
    +.
    -0.55
    !.
    -0.53
    !".
    -0.51
     ende
    -0.51
    zik
    -0.51
    cms
    -0.51
    .).
    -0.51
    POSITIVE LOGITS
    pires
    0.72
    pired
    0.51
     nutshell
    0.45
     depends
    0.45
     )]
    0.43
     resides
    0.42
     vanished
    0.41
     hangs
    0.41
     resided
    0.41
     Middle
    0.41
    Act Density 2.645%

    No Known Activations