INDEX
Explanations
terms related to non-profit organizations and non-discriminatory language
New Auto-Interp
Negative Logits
LabelTagHelper
-0.75
vettor
-0.71
proprement
-0.69
rumahnya
-0.68
supérieures
-0.66
respectivement
-0.66
scuro
-0.65
DockStyle
-0.64
particulières
-0.62
doulou
-0.62
POSITIVE LOGITS
non
3.65
Non
3.44
Non
3.41
non
3.31
NON
3.15
NON
2.90
非
2.58
非
2.13
nons
2.02
Nons
1.96
Activations Density 0.078%