INDEX
    Explanations

    I'm sorry, but based on the activations provided, I'm unable to determine a specific pattern or theme that neuron 4 is looking for in the text

    special characters or non-standard symbols in the text

    New Auto-Interp
    Negative Logits
    geries
    -0.91
    raints
    -0.72
     background
    -0.70
     distracted
    -0.69
     foreground
    -0.69
     offending
    -0.68
     plain
    -0.68
    gery
    -0.67
     Plain
    -0.67
     dividing
    -0.67
    POSITIVE LOGITS
    Ħ
    1.41
    ij
    1.14
    ¸
    1.10
    ļ
    1.06
    ĸ
    1.06
    и
    1.03
    Ĺ
    1.03
    1.02
    ¼
    1.01
    ¾
    1.00
    Act Density 0.003%

    No Known Activations