INDEX
    Explanations

    This neuron activates on the names of the classification categories (e.g. “Informative/Educational,” “Shock/Disgust/Fear based,” “Personal stories/statements,” “Advocacy,” etc.) in the prompt.

    New Auto-Interp
    Negative Logits
     groceries
    -0.07
    .business
    -0.07
    minecraft
    -0.07
     anlayış
    -0.06
     Nicholson
    -0.06
    _department
    -0.06
    -0.06
    MaxLength
    -0.06
    راد
    -0.06
    -0.06
    POSITIVE LOGITS
     manifest
    0.07
    =.
    0.06
     применя
    0.06
     Welfare
    0.06
     serde
    0.06
     важ
    0.06
     Display
    0.06
     plead
    0.06
     contamination
    0.06
     spawn
    0.06
    Act Density 0.053%

    No Known Activations