INDEX
Explanations
action words pertaining to challenging or breaking norms
New Auto-Interp
Negative Logits
Solution
-0.81
pmwiki
-0.77
kas
-0.73
Nanto
-0.70
Solution
-0.70
consulted
-0.68
assets
-0.68
emed
-0.66
umerable
-0.66
avail
-0.65
POSITIVE LOGITS
precon
1.07
stereotypes
1.04
misconceptions
1.02
adversity
1.02
poverty
0.95
clutter
0.94
stigma
0.94
oppressive
0.92
tyranny
0.90
inequalities
0.89
Activations Density 0.369%