INDEX
    Explanations

    punctuation

    The neuron detects occurrences of the word “explanation” (e.g. in “do not give any explanation”).

    New Auto-Interp
    Negative Logits
     th�
    -0.06
     طل
    -0.06
    -0.06
     WIDTH
    -0.06
    -0.06
     CFR
    -0.06
     wol
    -0.06
    Seen
    -0.06
     thinly
    -0.06
    タル
    -0.06
    POSITIVE LOGITS
    .vertical
    0.07
    Static
    0.07
     Algorithm
    0.06
    _gamma
    0.06
     Sampling
    0.06
     Ге
    0.06
    ahead
    0.06
    ’da
    0.06
    ером
    0.06
     boyfriend
    0.06
    Act Density 0.008%

    No Known Activations