INDEX
    Explanations

    disciplinary

    This neuron detects mentions of internal disciplinary processes or misconduct (e.g., words like “disciplinary,” “misconduct,” “internal actions”).

    New Auto-Interp
    Negative Logits
     went
    -0.06
     Photon
    -0.06
     nové
    -0.06
    Systems
    -0.06
    Your
    -0.06
    ownik
    -0.06
     creo
    -0.06
     peut
    -0.06
    NewLabel
    -0.06
     Rox
    -0.06
    POSITIVE LOGITS
     misconduct
    0.07
    0.07
     druž
    0.07
     ModelState
    0.07
    0.06
     Мих
    0.06
     escol
    0.06
     Inquiry
    0.06
     виконав
    0.06
    ы
    0.06
    Act Density 0.003%

    No Known Activations