INDEX
    Explanations

    The neuron activates on occurrences of the word “control.”

    New Auto-Interp
    Negative Logits
     sage
    -0.08
    -May
    -0.07
     passage
    -0.07
     May
    -0.07
     eleven
    -0.07
     května
    -0.06
     page
    -0.06
    Whenever
    -0.06
    าษ
    -0.06
     srpna
    -0.06
    POSITIVE LOGITS
     control
    0.16
    Control
    0.16
     Control
    0.15
    control
    0.15
     controls
    0.12
     Controls
    0.11
    (Control
    0.11
    _control
    0.11
     CONTROL
    0.11
    .control
    0.11
    Act Density 0.065%

    No Known Activations