INDEX
Explanations
phrases related to consequences and actions, especially those involving societal or political implications
references to abstract concepts or issues related to societal problems
New Auto-Interp
Negative Logits
Yel
-0.68
Spoon
-0.66
Pirates
-0.63
estern
-0.59
marines
-0.57
Sims
-0.57
aughed
-0.57
Aluminum
-0.56
UGH
-0.56
Suzuki
-0.56
POSITIVE LOGITS
oneself
0.99
self
0.97
ourselves
0.93
selves
0.87
chy
0.84
alian
0.84
anew
0.83
yourself
0.81
selves
0.80
obe
0.80
Activations Density 0.215%