INDEX
    Explanations

    references to the color red

    occurrences of the word "red."

    New Auto-Interp
    Negative Logits
    ILA
    -0.84
     Lank
    -0.80
    UGH
    -0.79
    ETHOD
    -0.78
    ernel
    -0.77
    Ö¼
    -0.76
    agall
    -0.76
    HAEL
    -0.74
    llah
    -0.72
    Technical
    -0.71
    POSITIVE LOGITS
    rawn
    1.17
    neck
    1.12
    oubt
    1.10
    efined
    1.08
    oub
    1.02
    headed
    1.01
     velvet
    0.99
    iscovered
    0.96
    iscovery
    0.93
    iscover
    0.91
    Act Density 0.024%

    No Known Activations