INDEX
Explanations
phrases related to empowerment
terms related to empowerment and support for marginalized groups
New Auto-Interp
Negative Logits
Goo
-0.73
patch
-0.68
Canaver
-0.67
ago
-0.67
×IJ
-0.66
eda
-0.66
Uniform
-0.65
owitz
-0.63
hound
-0.63
hiba
-0.63
POSITIVE LOGITS
ments
1.01
Reviewer
0.90
ment
0.85
MENTS
0.77
iences
0.77
mentation
0.75
irlf
0.73
ittees
0.72
empower
0.71
FUL
0.71
Activations Density 0.038%