INDEX
    Explanations

    statements related to social or political commentary, particularly those involving moral or ethical concerns

    New Auto-Interp
    Negative Logits
     sac
    -0.15
    itol
    -0.15
    ingerprint
    -0.14
    ÑĢе
    -0.14
    nÃŃ
    -0.14
    _mirror
    -0.14
     Mor
    -0.14
    434
    -0.13
    né
    -0.13
    té
    -0.13
    POSITIVE LOGITS
    ÐŁÐŀ
    0.17
    elps
    0.16
    adas
    0.16
    .examples
    0.15
    éIJĺ
    0.15
    ìĹ¼
    0.15
    ibel
    0.15
    .Elements
    0.14
    èĪĴ
    0.14
    xis
    0.14
    Act Density 0.366%

    No Known Activations