INDEX
Explanations
phrases related to raising awareness or advocating for issues
New Auto-Interp
Negative Logits
brains
-0.15
ter
-0.15
anding
-0.15
aju
-0.15
aji
-0.14
ta
-0.14
ize
-0.14
oe
-0.14
ceptive
-0.13
Td
-0.13
POSITIVE LOGITS
eyebrows
0.37
awareness
0.29
stakes
0.28
hack
0.27
brows
0.26
Awareness
0.25
alarms
0.23
flags
0.23
consciousness
0.23
eyebrow
0.23
Activations Density 0.041%