INDEX
Explanations
instances of words related to colors
references to racial and cultural identifiers or stereotypes
New Auto-Interp
Negative Logits
igious
-0.72
FUN
-0.67
Services
-0.66
izarre
-0.66
atis
-0.65
Internal
-0.64
Effective
-0.64
Specific
-0.64
bably
-0.64
ãĤ¡
-0.63
POSITIVE LOGITS
stripes
0.92
striped
0.80
stripe
0.79
oxide
0.77
flakes
0.76
Metallic
0.74
syndrome
0.73
cloth
0.73
stain
0.71
berries
0.70
Activations Density 0.303%