INDEX
    Explanations

    This neuron detects age‐rating or maturity indicators (e.g. “18+,” “mature audiences”) in content warnings.

    New Auto-Interp
    Negative Logits
     alumno
    -0.07
     meziná
    -0.07
     prostituerte
    -0.06
    LineNumber
    -0.06
     datingside
    -0.06
     tesis
    -0.06
     atual
    -0.06
     frase
    -0.06
     mice
    -0.06
     hiệu
    -0.06
    POSITIVE LOGITS
     س
    0.08
    -п
    0.06
     Amendment
    0.06
    zman
    0.06
    _lin
    0.06
    .channel
    0.06
    0.06
     особ
    0.06
    0.06
    715
    0.06
    Act Density 0.001%

    No Known Activations