INDEX
Explanations
phrases indicating disagreement or opposition
statements regarding consumer choices and political stances on social issues
New Auto-Interp
Negative Logits
WAR
-0.59
Released
-0.59
RL
-0.58
NYC
-0.55
Born
-0.53
Weird
-0.53
agonists
-0.53
Pg
-0.52
Yor
-0.52
Whedon
-0.52
POSITIVE LOGITS
)."
1.29
.")
1.14
.""
1.13
.'"
1.11
."[
1.01
'."
1.00
â̦"
0.99
."
0.99
..."
0.96
}"
0.96
Activations Density 1.324%