INDEX
    Explanations

    This neuron specifically detects occurrences of the word “original” (including its subword pieces) in the text.

    New Auto-Interp
    Negative Logits
    -axis
    -0.06
     neuen
    -0.06
     Approx
    -0.06
    ]>=
    -0.06
     skept
    -0.06
     awarded
    -0.06
    δόν
    -0.06
    -0.06
    ATEST
    -0.06
     pd
    -0.06
    POSITIVE LOGITS
    Chi
    0.08
    Illuminate
    0.07
     mattresses
    0.07
    CLE
    0.06
    .constant
    0.06
    MAR
    0.06
    .CharField
    0.06
    ')</
    0.06
    .Agent
    0.06
     chair
    0.06
    Act Density 0.016%

    No Known Activations