INDEX
Explanations
words related to disparities and discrepancies
terms related to inequality and differences between groups
New Auto-Interp
Negative Logits
sworn
-0.69
cend
-0.68
guided
-0.66
safe
-0.63
bark
-0.63
Kiss
-0.63
Commands
-0.63
assad
-0.63
help
-0.62
Tag
-0.62
POSITIVE LOGITS
disparity
3.01
discrepancy
2.88
discrepancies
2.68
disparities
2.54
imbalance
2.39
mismatch
2.09
inconsistencies
2.07
inconsistency
2.00
disproportion
1.96
discrep
1.92
Activations Density 0.035%