INDEX
Explanations
terms related to social justice and advocacy for marginalized groups
New Auto-Interp
Negative Logits
Polk
-0.84
Caleb
-0.79
Mirage
-0.78
Villa
-0.78
Xiao
-0.77
Rez
-0.76
Meter
-0.75
Mes
-0.74
Axel
-0.73
Xiang
-0.73
POSITIVE LOGITS
nob
1.23
every
1.21
null
1.21
prep
1.21
ever
1.19
other
1.19
sure
1.18
average
1.16
new
1.15
subject
1.15
Activations Density 0.103%