INDEX
    Explanations

    This neuron activates on occurrences of the substring “ice,” effectively detecting the token “ice.”

    New Auto-Interp
    Negative Logits
     Ferdinand
    -0.07
     Freund
    -0.07
    164
    -0.07
     comprehend
    -0.07
     srov
    -0.07
     Tal
    -0.07
     Fon
    -0.06
     transmitted
    -0.06
    344
    -0.06
    ενο
    -0.06
    POSITIVE LOGITS
     ice
    0.18
     Ice
    0.16
    Ice
    0.16
     ICE
    0.11
    ices
    0.10
    ice
    0.09
     icy
    0.09
    IC
    0.08
    ici
    0.08
     iceberg
    0.08
    Act Density 0.007%

    No Known Activations