INDEX
Explanations
words related to political and social issues, activism, and societal challenges
New Auto-Interp
Negative Logits
etheless
-0.71
also
-0.53
Canaver
-0.52
yssey
-0.50
GOODMAN
-0.50
evidence
-0.48
shapeshifter
-0.46
IBLE
-0.46
blance
-0.46
zik
-0.46
POSITIVE LOGITS
etc
0.73
or
0.55
®,
0.49
blah
0.48
versus
0.48
)</
0.47
)/
0.44
swe
0.44
!),
0.44
baths
0.44
Activations Density 11.256%