INDEX
    Explanations

    punctuation

    The neuron fires on speaker‐turn labels (the numeric IDs marking who’s speaking).

    New Auto-Interp
    Negative Logits
     ridiculous
    -0.06
    .Creator
    -0.06
     Володими
    -0.06
    ená
    -0.06
    ані
    -0.06
     dangerous
    -0.06
    орі
    -0.06
    	delta
    -0.06
     reck
    -0.06
    emaker
    -0.06
    POSITIVE LOGITS
     Bronze
    0.07
     entrepreneurial
    0.07
    '));↵↵
    0.07
    (tokens
    0.06
     ******************************************************************************/↵
    0.06
    !
    ↵
    0.06
    ायन
    0.06
     좋아
    0.06
    0.06
    .vo
    0.06
    Act Density 0.025%

    No Known Activations