INDEX
Explanations
terms related to nondiscrimination
terms related to non-discrimination and social justice issues
New Auto-Interp
Negative Logits
================
-0.77
======
-0.75
Package
-0.70
Syndicate
-0.68
ppa
-0.67
================================================================
-0.65
ularity
-0.65
shotguns
-0.62
raltar
-0.62
Blaze
-0.62
POSITIVE LOGITS
ouble
1.03
ocument
1.01
oubted
0.98
imensional
0.95
irect
0.94
ere
0.91
ynamic
0.91
emet
0.89
aniel
0.86
isc
0.86
Activations Density 0.009%