INDEX
    Explanations

    phrases related to raising awareness or advocating for issues

    New Auto-Interp
    Negative Logits
    brains
    -0.15
    ter
    -0.15
    anding
    -0.15
    aju
    -0.15
    aji
    -0.14
    ta
    -0.14
    ize
    -0.14
    oe
    -0.14
    ceptive
    -0.13
    Td
    -0.13
    POSITIVE LOGITS
     eyebrows
    0.37
     awareness
    0.29
     stakes
    0.28
     hack
    0.27
     brows
    0.26
     Awareness
    0.25
     alarms
    0.23
     flags
    0.23
     consciousness
    0.23
     eyebrow
    0.23
    Act Density 0.041%

    No Known Activations