INDEX
    Explanations

    The neuron activates on technical NLP jargon and methodology terms (e.g. “coreferential,” “grammatical,” “categorizing,” “meaning”), effectively spotting mentions of natural‐language‐processing concepts.

    New Auto-Interp
    Negative Logits
     Allocate
    -0.06
    -based
    -0.06
    .currentThread
    -0.06
    ical
    -0.06
     passport
    -0.06
    Initialized
    -0.06
     jurisdiction
    -0.06
     quicker
    -0.06
    -0.05
     cracked
    -0.05
    POSITIVE LOGITS
    (".
    0.07
    μένα
    0.06
    ossier
    0.06
    0.06
     (*.
    0.06
     otel
    0.06
     unins
    0.06
    ..↵
    0.06
    /',↵
    0.06
    $('.
    0.06
    Act Density 0.115%

    No Known Activations