INDEX
Explanations
mentions of colors and color-related descriptors
New Auto-Interp
Negative Logits
Defenders
-0.66
Fund
-0.65
Privacy
-0.65
Reply
-0.64
Cosponsors
-0.63
deductible
-0.60
WHO
-0.59
VK
-0.59
VOL
-0.59
QUEST
-0.59
POSITIVE LOGITS
icol
1.35
ás
1.03
osph
0.98
umbered
0.98
osity
0.92
hyde
0.89
umbers
0.87
orescent
0.85
onial
0.85
ately
0.84
Activations Density 0.011%