INDEX
Explanations
expressions of outrage and frustration regarding social issues and personal experiences
New Auto-Interp
Negative Logits
лаÑĤÑĥ
-0.16
zn
-0.16
ä¸įå¾Ĺ
-0.15
Depends
-0.14
iem
-0.14
Trivia
-0.14
usch
-0.14
åĶ
-0.14
pron
-0.13
igua
-0.13
POSITIVE LOGITS
Wake
0.27
wake
0.27
wake
0.26
Wake
0.26
Grow
0.24
Grow
0.22
stop
0.22
why
0.22
Stop
0.21
grow
0.20
Activations Density 0.293%