INDEX
    Explanations

    The neuron never activates—it doesn’t pick out any particular token or pattern.

    New Auto-Interp
    Negative Logits
     Bring
    -0.07
     Firstly
    -0.07
     babel
    -0.06
    еты
    -0.06
    (Tile
    -0.06
    .called
    -0.06
    /mock
    -0.06
     projet
    -0.06
    -*-
    -0.06
     criticism
    -0.06
    POSITIVE LOGITS
    harma
    0.06
    ụp
    0.06
     prohibited
    0.06
     fuse
    0.06
    izacao
    0.06
    443
    0.06
    uffed
    0.06
    ausal
    0.06
     deprived
    0.06
    ]);
    ↵
    ↵
    0.06
    Act Density 0.006%

    No Known Activations