INDEX
    Explanations

    Problems and malfunctions

    the neuron activates on words describing reported problems, faults, or investigations (e.g. incident, discovered, corrosion, difficulty).

    New Auto-Interp
    Negative Logits
    	ap
    -0.07
     jenter
    -0.07
     ----------------------------------------------------------------------------------------------------------------
    -0.06
     nale
    -0.06
     sug
    -0.06
     mezun
    -0.06
     masturb
    -0.06
     intern
    -0.06
     През
    -0.06
    (log
    -0.06
    POSITIVE LOGITS
    YLES
    0.07
    (Long
    0.07
    rons
    0.07
    irate
    0.06
    .Future
    0.06
     symmetric
    0.06
    ;"↵
    0.06
     Finding
    0.06
    setFont
    0.06
    _OVERFLOW
    0.06
    Act Density 0.010%

    No Known Activations