INDEX
Explanations
references to social and historical issues, particularly those involving race and systemic injustices
New Auto-Interp
Negative Logits
üss
-0.14
responses
-0.13
endale
-0.13
íĨµíķ´
-0.13
Russell
-0.13
empo
-0.13
Elle
-0.13
inin
-0.13
oda
-0.13
Approach
-0.13
POSITIVE LOGITS
-themed
0.38
-related
0.33
themed
0.30
-focused
0.28
related
0.24
related
0.23
.related
0.23
-theme
0.22
_related
0.22
ê´Ģ볨
0.21
Activations Density 0.421%