INDEX
    Explanations

    This neuron activates on occurrences of the token “train” (i.e. it detects mentions of a train).

    New Auto-Interp
    Negative Logits
    Kelly
    -0.07
    elerle
    -0.07
     caps
    -0.07
     cort
    -0.06
    Labor
    -0.06
    ipple
    -0.06
     Lily
    -0.06
    aber
    -0.06
     Caps
    -0.06
    oxy
    -0.06
    POSITIVE LOGITS
     train
    0.16
     Train
    0.12
     trains
    0.11
    train
    0.10
     cậu
    0.08
    (train
    0.07
    .train
    0.07
    Train
    0.07
     Steak
    0.07
     français
    0.07
    Act Density 0.008%

    No Known Activations