INDEX
    Explanations

    The neuron activates specifically on the word “enhance.”

    New Auto-Interp
    Negative Logits
    141
    -0.08
     traveled
    -0.07
    -0.07
     Problem
    -0.07
     looping
    -0.07
     Sorted
    -0.06
     lives
    -0.06
    297
    -0.06
    _roll
    -0.06
     logically
    -0.06
    POSITIVE LOGITS
     enhance
    0.15
     enhancing
    0.13
     enhanced
    0.12
     enhances
    0.12
    Enh
    0.11
     enhancement
    0.10
     Enhanced
    0.10
     Enh
    0.10
    -enh
    0.09
     Enhancement
    0.09
    Act Density 0.016%

    No Known Activations