INDEX
    Explanations

    mentions of specific colors

    New Auto-Interp
    Negative Logits
    OHN
    -0.82
    ammad
    -0.80
    atican
    -0.77
    ITAL
    -0.76
    CHA
    -0.75
    IJ
    -0.74
    CHAT
    -0.72
    Xi
    -0.71
     Kaplan
    -0.70
    PER
    -0.69
    POSITIVE LOGITS
     colours
    1.24
     colour
    1.20
    colour
    1.13
     palette
    0.97
     Colour
    0.94
     stripe
    0.87
     coloured
    0.87
    anguage
    0.86
     colors
    0.86
     stripes
    0.82
    Act Density 0.010%

    No Known Activations