INDEX
Explanations
phrases related to activism and social change
phrases related to manipulation or altering perceptions
New Auto-Interp
Negative Logits
andowski
-0.75
terday
-0.66
archive
-0.66
ynthesis
-0.66
apo
-0.64
coron
-0.60
static
-0.60
ylan
-0.59
arag
-0.59
Stew
-0.59
POSITIVE LOGITS
yourselves
1.30
yourself
1.29
Yourself
1.03
wisely
1.03
your
0.86
!
0.86
responsibly
0.85
!:
0.85
!'
0.83
cknow
0.83
Activations Density 0.733%