INDEX
    Explanations

    This neuron detects mentions of errors (e.g. the words “error” or “errors”).

    New Auto-Interp
    Negative Logits
    dos
    -0.07
    ۳
    -0.07
    milliseconds
    -0.06
     εγκα
    -0.06
     पड़
    -0.06
     Hoffman
    -0.06
     tắc
    -0.06
    emsp
    -0.06
     út
    -0.06
    rams
    -0.06
    POSITIVE LOGITS
     loyalty
    0.07
    -flash
    0.07
     gitti
    0.07
    _;
    0.07
    LineStyle
    0.07
     Peygamber
    0.06
    Document
    0.06
    ovah
    0.06
     بهترین
    0.06
     =&
    0.06
    Act Density 0.008%

    No Known Activations