INDEX
Explanations
references to disparities or gaps between groups or conditions
New Auto-Interp
Negative Logits
buiten
-0.53
präsident
-0.52
UTERS
-0.50
一回
-0.47
corso
-0.45
suelos
-0.43
geworden
-0.43
ğu
-0.43
彤
-0.42
utnik
-0.42
POSITIVE LOGITS
gap
1.94
Gap
1.70
gap
1.69
Gap
1.68
gaps
1.54
Gaps
1.44
difference
1.42
GAP
1.39
gaps
1.35
disparity
1.34
Activations Density 0.768%