INDEX
    Explanations

    Code/Programming

    This neuron responds specifically to the “user” speaker tag in the chat‐format metadata.

    sexual situations with power dynamics.

    New Auto-Interp
    Negative Logits
    ılması
    -0.07
    -0.06
     splits
    -0.06
    cole
    -0.06
    perimental
    -0.06
    -0.06
     steel
    -0.06
    setw
    -0.06
     sa
    -0.06
    ATT
    -0.06
    POSITIVE LOGITS
     Neuro
    0.07
     treatments
    0.07
     tricks
    0.06
     Convers
    0.06
    یدن
    0.06
     opciones
    0.06
     Famous
    0.06
    -kind
    0.06
    Know
    0.06
     قدر
    0.05
    Act Density 0.028%

    No Known Activations