INDEX
    Explanations

    words and terms related to specific geographic or cultural identities

    the neuron activates on subword tokens that are the initial piece or prefix of a longer word (i.e., beginning-of-word subword stems).

    New Auto-Interp
    Negative Logits
     Pos
    -0.48
     Tor
    -0.48
     Phan
    -0.45
     ком
    -0.45
     Luc
    -0.44
     cookie
    -0.44
     Aps
    -0.43
     неза
    -0.43
     Коло
    -0.43
     Pun
    -0.43
    POSITIVE LOGITS
    DebuggerNonUser
    0.63
    complexContent
    0.62
     initComponents
    0.56
    parsedMessage
    0.54
    BeginInit
    0.54
     GenerationType
    0.52
    MLLoader
    0.52
    󠁮
    0.51
    VersionUID
    0.50
    bootstrapcdn
    0.50
    Act Density 0.634%

    No Known Activations