INDEX
Explanations
phrases related to treatment and fairness towards individuals or groups
New Auto-Interp
Negative Logits
Hecht
-0.59
helle
-0.51
Giordano
-0.50
bénéficiaire
-0.50
Ubic
-0.49
interlocking
-0.49
oxu
-0.49
XSSF
-0.48
ricist
-0.47
tadiene
-0.47
POSITIVE LOGITS
treat
1.13
treated
1.11
treating
1.08
treated
1.04
Treat
1.03
Treat
0.98
TREAT
0.97
treat
0.96
Treated
0.96
treats
0.95
Activations Density 0.173%