INDEX
    Explanations

    This neuron detects references to rules or restrictions (e.g., guidelines, policies, ethics, morality, filters).

    New Auto-Interp
    Negative Logits
    asters
    -0.07
    -aos
    -0.07
     pek
    -0.06
     unions
    -0.06
     vite
    -0.06
    -0.06
     Midwest
    -0.06
    ランド
    -0.06
     MF
    -0.06
    show
    -0.06
    POSITIVE LOGITS
     Vitamin
    0.08
    REFER
    0.07
     ISSUE
    0.07
    _demand
    0.07
    (contact
    0.07
    rients
    0.07
    athed
    0.06
     graceful
    0.06
    ::*;↵
    0.06
     RoundedRectangleBorder
    0.06
    Act Density 0.008%

    No Known Activations