INDEX
Explanations
color-related descriptions, particularly focusing on the color black
"black" or "dark"
New Auto-Interp
Negative Logits
müſſen
-0.69
-0.68
Rptr
-0.67
queſto
-0.67
パンチラ
-0.66
<unused51>
-0.66
<unused42>
-0.66
<unused28>
-0.65
[@BOS@]
-0.65
<pad>
-0.65
POSITIVE LOGITS
dark
0.96
black
0.89
Black
0.89
darkness
0.86
Black
0.85
Dark
0.84
BLACK
0.84
black
0.84
dark
0.81
Dark
0.80
Activations Density 0.515%