INDEX
    Explanations

    code snippets

    This neuron activates on words and tokens associated with requests to reorganize or rewrite text for clarity or formality.

    New Auto-Interp
    Negative Logits
     FAIL
    -0.07
    -template
    -0.07
    FDA
    -0.07
    Letters
    -0.07
    arcer
    -0.07
    ka
    -0.07
    Points
    -0.06
    vinc
    -0.06
    _direct
    -0.06
     Diagnostic
    -0.06
    POSITIVE LOGITS
    )이
    0.07
    0.06
    ulong
    0.06
     досвід
    0.06
     ====
    0.06
    어진
    0.06
    etxt
    0.06
    0.06
     exceeding
    0.06
    CEEDED
    0.06
    Act Density 0.054%

    No Known Activations