INDEX
Explanations
references to social criticism and commentary related to political themes
New Auto-Interp
Negative Logits
_MP
-0.16
asing
-0.15
ToProps
-0.14
åIJ¹
-0.14
onas
-0.13
ania
-0.13
oÅĪ
-0.13
voks
-0.13
ydk
-0.13
uhn
-0.13
POSITIVE LOGITS
message
0.23
critique
0.23
themes
0.23
sat
0.22
questions
0.21
crit
0.21
addressed
0.21
messages
0.20
reflection
0.20
Message
0.20
Activations Density 0.349%