INDEX
Explanations
keywords related to legal terms and political figures
phrases related to combating gender bias and promoting equality
New Auto-Interp
Negative Logits
vulner
-0.60
rigging
-0.60
..."
-0.59
respectively
-0.59
prec
-0.56
fame
-0.55
opposite
-0.53
polar
-0.51
persuasion
-0.49
decisive
-0.49
POSITIVE LOGITS
ashtra
0.70
arius
0.68
arij
0.66
yna
0.65
yn
0.64
Profile
0.64
abus
0.63
ulous
0.62
azel
0.62
ibliography
0.61
Activations Density 1.642%