INDEX
    Explanations

    negative statements or concepts related to relationships and social issues

    New Auto-Interp
    Negative Logits
     [...]
    -0.15
     Potion
    -0.14
    ...]↵↵
    -0.14
    .nano
    -0.14
     recent
    -0.14
    ()['
    -0.14
    ahrain
    -0.14
    inati
    -0.14
    swick
    -0.14
     [...]↵↵
    -0.13
    POSITIVE LOGITS
    libs
    0.17
    0.16
    ideo
    0.15
    306
    0.14
     illeg
    0.14
     Americans
    0.14
     conservatives
    0.14
     Barack
    0.14
    sob
    0.14
     because
    0.14
    Act Density 0.001%

    No Known Activations