INDEX
Explanations
relationships involving power dynamics, roles, and social structures
New Auto-Interp
Negative Logits
illez
-0.16
ebra
-0.16
irth
-0.16
едини
-0.16
IRTH
-0.15
gid
-0.15
gid
-0.14
raki
-0.14
gree
-0.14
ihu
-0.14
POSITIVE LOGITS
vs
0.19
-vers
0.18
versus
0.17
-vs
0.16
isp
0.16
followed
0.15
Outdoor
0.15
Uvs
0.15
auc
0.15
Ã
0.15
Activations Density 0.175%