INDEX
    Explanations

    mentions of the color red

    New Auto-Interp
    Negative Logits
     gray
    -0.16
     blackColor
    -0.16
     grey
    -0.16
     Haram
    -0.15
    ightly
    -0.15
    led
    -0.15
    asted
    -0.15
     Gray
    -0.14
     turquoise
    -0.14
    egrated
    -0.14
    POSITIVE LOGITS
    oub
    0.28
    dest
    0.27
    acted
    0.26
    dish
    0.26
    /red
    0.24
    emption
    0.23
    empt
    0.23
    -hot
    0.23
    shift
    0.22
    uces
    0.22
    Act Density 0.037%

    No Known Activations