INDEX
    Explanations

    references to political figures and their actions or statements

    New Auto-Interp
    Negative Logits
    lich
    -0.15
    anel
    -0.14
     äºĭ
    -0.14
    lam
    -0.14
    .nasa
    -0.14
    esper
    -0.13
    crap
    -0.13
    íķŃ
    -0.13
     Rain
    -0.13
    ãĤ¤ãĥ«
    -0.13
    POSITIVE LOGITS
     should
    0.29
    should
    0.24
     Should
    0.23
    Should
    0.23
     ought
    0.22
     shouldn
    0.20
    .should
    0.17
     deber
    0.17
    920
    0.17
     SHOULD
    0.17
    Act Density 0.179%

    No Known Activations