INDEX
    Explanations

    The neuron specifically lights up on words containing the root “trap” (e.g. trap, trapping, trapper, trapdoors, traps).

    New Auto-Interp
    Negative Logits
     Miche
    -0.07
    Cole
    -0.07
    这样的
    -0.06
    oud
    -0.06
    ied
    -0.06
     cole
    -0.06
     شهرد
    -0.06
     mileage
    -0.06
     Cleveland
    -0.06
    date
    -0.06
    POSITIVE LOGITS
     Trap
    0.15
     trap
    0.13
     trapped
    0.12
     traps
    0.11
    Trap
    0.10
    trap
    0.09
     trapping
    0.09
    0.07
    rap
    0.07
    strap
    0.07
    Act Density 0.005%

    No Known Activations