INDEX
    Explanations

    the main thing this neuron does is detect informal, enthusiastic first-person social-media style language (e.g. “I’m excited,” “love using,” “my thoughts,” exclamation and conversational tone).

    New Auto-Interp
    Negative Logits
    zp
    -0.07
    -0.07
    unt
    -0.07
     karma
    -0.07
    forcing
    -0.07
     MF
    -0.06
    _port
    -0.06
    Or
    -0.06
     leth
    -0.06
    .orig
    -0.06
    POSITIVE LOGITS
     tạm
    0.07
     ()↵
    0.07
     '__
    0.07
    °}
    0.07
    "\
    0.07
    '])
    ↵
    0.06
     etmeye
    0.06
    .addHandler
    0.06
     reprint
    0.06
     mockMvc
    0.06
    Act Density 0.066%

    No Known Activations