INDEX
    Explanations

    references to colors in various contexts

    New Auto-Interp
    Negative Logits
    ftagPool
    -0.82
    SequentialGroup
    -0.74
     <<<<<<<<<<<<<<
    -0.74
     مشين
    -0.69
    [@BOS@]
    -0.65
    <pad>
    -0.65
    <unused8>
    -0.65
    <unused41>
    -0.65
    <unused3>
    -0.65
    <unused14>
    -0.65
    POSITIVE LOGITS
     color
    1.09
     Color
    0.99
    Color
    0.93
    color
    0.88
     colour
    0.87
     COLOR
    0.86
     colors
    0.85
    getColor
    0.85
     Colour
    0.83
    Colour
    0.82
    Act Density 0.028%

    No Known Activations