INDEX
    Explanations

    color-related descriptions, particularly focusing on the color black

    New Auto-Interp
    Negative Logits
     müſſen
    -0.69
    󠀠
    -0.68
    Rptr
    -0.67
     queſto
    -0.67
     パンチラ
    -0.66
    <unused51>
    -0.66
    <unused42>
    -0.66
    <unused28>
    -0.65
    [@BOS@]
    -0.65
    <pad>
    -0.65
    POSITIVE LOGITS
     dark
    0.96
     black
    0.89
     Black
    0.89
     darkness
    0.86
    Black
    0.85
     Dark
    0.84
     BLACK
    0.84
    black
    0.84
    dark
    0.81
    Dark
    0.80
    Act Density 0.515%

    No Known Activations