INDEX
    Explanations

    conclusions or summary statements, typically marked by periods

    citations and foreign words

    This neuron detects sentence boundaries, firing strongly at the start-of-sentence token and at sentence-final punctuation.

    New Auto-Interp
    Negative Logits
    ↵↵
    -0.76
    -0.66
    ↵↵↵
    -0.62
    ↵↵↵↵
    -0.60
    ↵↵↵↵↵↵
    -0.54
     [...]
    -0.54
    ↵↵↵↵↵↵↵
    -0.54
     […]
    -0.52
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    -0.52
      【
    -0.51
    POSITIVE LOGITS
    bibfield
    0.59
     queſta
    0.52
     Bewußt
    0.52
    ofür
    0.51
    antaranya
    0.51
     człowie
    0.50
     bénévol
    0.48
     biß
    0.47
    󠁮
    0.47
    tiérrez
    0.47
    Act Density 2.333%

    No Known Activations