INDEX
    Explanations

    It appears that neuron 4 did not activate for any tokens in the provided examples, suggesting it seems to be looking for a pattern or feature not present in the text samples

    New Auto-Interp
    Negative Logits
     rocks
    -0.79
    inas
    -0.72
     rall
    -0.70
     scaling
    -0.68
     inline
    -0.64
     nutshell
    -0.64
    gem
    -0.62
    Interstitial
    -0.62
     clad
    -0.61
     pel
    -0.60
    POSITIVE LOGITS
    etsk
    0.85
    uberty
    0.83
    olitan
    0.78
     Seb
    0.77
     Sov
    0.73
     Everybody
    0.73
     Neb
    0.73
     Flavoring
    0.71
     Nobody
    0.69
    ĨĴ
    0.69
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.