INDEX
Explanations
references to hair and its characteristics, as well as associations with hearing
New Auto-Interp
Negative Logits
hair
-0.84
hair
-0.76
cheveux
-0.67
Hair
-0.65
Hair
-0.64
HAIR
-0.59
Haare
-0.57
hairs
-0.54
haired
-0.52
頭髮
-0.52
POSITIVE LOGITS
dressing
0.75
dress
0.75
loss
0.71
dresser
0.64
Loss
0.63
loss
0.63
dresser
0.63
piece
0.62
rrggbb
0.61
pieces
0.61
Activations Density 0.196%