INDEX
    Explanations

    The neuron primarily activates on terms related to safety and protection contexts.

    New Auto-Interp
    Negative Logits
    -directory
    -0.07
    clave
    -0.06
     Sark
    -0.06
     Berm
    -0.06
     hearings
    -0.06
     skal
    -0.06
     Canter
    -0.06
     Lana
    -0.06
    лати
    -0.06
    _accept
    -0.06
    POSITIVE LOGITS
     zh
    0.07
     """.
    0.07
    .Visible
    0.07
    ostel
    0.07
    いつ
    0.06
    ARGS
    0.06
     embedding
    0.06
    ?????
    0.06
     muster
    0.06
    )])↵↵
    0.06
    Act Density 0.069%

    No Known Activations