INDEX
    Explanations

    references to colors, particularly shades of purple, pink, blue, and black

    colors and visual descriptors

    New Auto-Interp
    Negative Logits
     queſto
    -0.73
     vooz
    -0.73
     laſſen
    -0.73
    iſten
    -0.69
     beſti
    -0.68
    <unused41>
    -0.68
     zwiſchen
    -0.68
    <unused23>
    -0.68
    <unused20>
    -0.68
    <unused17>
    -0.68
    POSITIVE LOGITS
    ioutil
    0.27
     Außer
    0.27
     colour
    0.27
     occupe
    0.25
     deutschen
    0.25
     amerikanischen
    0.24
     ganzen
    0.24
     weißen
    0.24
    labelledby
    0.23
     color
    0.23
    Act Density 0.016%

    No Known Activations