INDEX
Explanations
references to political and social issues, discussions, and decisions
references to collective actions and the use of "we" in discussions about societal issues
New Auto-Interp
Negative Logits
aults
-0.73
Rowe
-0.68
REDACTED
-0.65
amina
-0.64
imum
-0.59
cum
-0.59
Kush
-0.59
Publication
-0.59
Spec
-0.58
imo
-0.58
POSITIVE LOGITS
akening
1.22
're
1.21
've
1.19
shouldn
1.12
owe
1.11
ought
1.04
need
1.02
ourselves
1.02
cannot
1.01
asel
1.01
Activations Density 0.206%