INDEX
Explanations
discussions about the treatment and rights of marginalized groups, particularly focusing on the injustices they face
New Auto-Interp
Negative Logits
stället
-0.58
silicona
-0.54
chrétienne
-0.54
zło
-0.54
cref
-0.53
Зачем
-0.51
افظة
-0.51
spyOn
-0.49
frey
-0.48
jonal
-0.47
POSITIVE LOGITS
treatment
0.69
treatment
0.68
待遇
0.67
treated
0.67
Treatment
0.62
treated
0.61
Treatment
0.60
TREATMENT
0.60
Treated
0.59
justice
0.58
Activations Density 0.486%