INDEX
    Explanations

    The neuron activates on words that signal problems, faults, or negative evaluations (e.g. malfunctions, errors, disadvantages).

    New Auto-Interp
    Negative Logits
    _dropout
    -0.07
    _some
    -0.07
    refs
    -0.07
     closeButton
    -0.06
    Occ
    -0.06
     cultivation
    -0.06
    Kom
    -0.06
    startswith
    -0.06
    _vis
    -0.06
     graduate
    -0.06
    POSITIVE LOGITS
    —
    0.07
    "=>"
    0.06
    。<
    0.06
    inton
    0.06
    erties
    0.06
     massac
    0.06
    ORTH
    0.06
    ',['
    0.06
     Sr
    0.06
     Pregnancy
    0.06
    Act Density 0.079%

    No Known Activations