INDEX
    Explanations

    negative descriptions about societal issues or institutions

    New Auto-Interp
    Negative Logits
     colors
    -0.18
    ehler
    -0.17
     favorable
    -0.17
     theaters
    -0.17
     favor
    -0.17
     unfavorable
    -0.16
     favors
    -0.15
    localized
    -0.15
     coloring
    -0.15
     Colors
    -0.15
    POSITIVE LOGITS
     mate
    0.32
     bol
    0.31
     mates
    0.29
     sod
    0.28
    blo
    0.26
     blo
    0.25
     nonce
    0.23
    bol
    0.23
     proper
    0.23
    Oi
    0.23
    Act Density 0.652%

    No Known Activations