INDEX
Explanations
categories or classifications related to social issues and gender identity
New Auto-Interp
Negative Logits
rary
-0.16
imu
-0.15
Pt
-0.14
ElapsedTime
-0.14
Hector
-0.14
Worker
-0.14
Bib
-0.13
squ
-0.13
.datab
-0.13
sector
-0.13
POSITIVE LOGITS
inx
0.16
Ø´Ùħ
0.16
velt
0.16
hores
0.15
xis
0.15
bbing
0.15
asley
0.15
Diet
0.15
removeFromSuperview
0.14
vern
0.14
Activations Density 0.009%