INDEX
    Explanations

    The neuron detects short affirmative response tokens (instances of agreement/“yes”-type replies) in the text.

    New Auto-Interp
    Negative Logits
    348
    -0.07
    що
    -0.07
     storia
    -0.07
    perience
    -0.07
     Anthrop
    -0.07
    cedure
    -0.06
    führ
    -0.06
     noir
    -0.06
     citt
    -0.06
    placement
    -0.06
    POSITIVE LOGITS
     yes
    0.12
     YES
    0.09
    Yes
    0.09
     Yes
    0.09
    "Yes
    0.08
    0.07
     positive
    0.07
    .Yes
    0.07
     RS
    0.07
     yAxis
    0.07
    Act Density 0.037%

    No Known Activations