INDEX
    Explanations

    This neuron spots the closing quote‐and‐bracket sequence (“]”) that marks the end of the user’s placeholder for “your answer” in toxic‐speech instructions.

    New Auto-Interp
    Negative Logits
     програми
    -0.06
     Musical
    -0.06
    Informe
    -0.06
    _gift
    -0.06
     десят
    -0.06
     upbeat
    -0.06
    another
    -0.06
    atar
    -0.06
     علت
    -0.06
    ường
    -0.06
    POSITIVE LOGITS
    ольз
    0.08
    .toJson
    0.06
     ]]
    0.06
     lose
    0.06
    0.06
    okes
    0.06
    mA
    0.06
    .Left
    0.06
     FOREIGN
    0.06
    .left
    0.06
    Act Density 0.000%

    No Known Activations