INDEX
    Explanations

    It seems that neuron 4 is having trouble identifying a specific pattern in the text provided as there are many different characters and seemingly random activations

    characters from various alphabets and symbols

    New Auto-Interp
    Negative Logits
    anwhile
    -0.99
     msec
    -0.82
    theless
    -0.81
    ftime
    -0.78
     agre
    -0.75
    nyder
    -0.73
    abase
    -0.71
     dope
    -0.70
    espie
    -0.70
     cocaine
    -0.69
    POSITIVE LOGITS
    ¥
    1.69
    ı
    1.58
    Į
    1.52
    İ
    1.51
    Ł
    1.51
    »
    1.50
    Ī
    1.49
    ²
    1.49
    ´
    1.48
    º
    1.47
    Act Density 0.017%

    No Known Activations