INDEX
Explanations
expressions of social justice and activism
New Auto-Interp
Negative Logits
-
-0.22
"
-0.21
-↵↵
-0.18
...
-0.17
ldr
-0.16
'
-0.15
.
-0.15
ï¼ļ"
-0.15
’
-0.14
oeff
-0.14
POSITIVE LOGITS
oma
0.17
,↵
0.16
'gc
0.15
panel
0.15
facade
0.14
.nano
0.14
↵
0.14
narr
0.14
ham
0.14
Sophia
0.14
Activations Density 0.002%