INDEX
    Explanations

    choices, requirements, conditions, options

    This neuron activates on mentions of “safe” (and related safety contexts), flagging statements about things being safe or unsafe.

    New Auto-Interp
    Negative Logits
    chez
    -0.07
    _Struct
    -0.07
     athletic
    -0.06
    Personally
    -0.06
    -stop
    -0.06
    ']↵↵
    -0.06
     tumblr
    -0.06
     runoff
    -0.06
    omain
    -0.06
    Arc
    -0.06
    POSITIVE LOGITS
    ادية
    0.07
    0.07
     wore
    0.07
    /gui
    0.06
    sole
    0.06
     Shoot
    0.06
     Directive
    0.06
    	index
    0.06
    (todo
    0.06
    ・ア
    0.06
    Act Density 0.321%

    No Known Activations