INDEX
    Explanations

    summarization followed by punctuation

    The neuron detects salient content-carrying words — important task/topic nouns and verbs (i.e., semantically informative tokens).

    New Auto-Interp
    Negative Logits
     tenang
    0.24
     existem
    0.22
     stratégie
    0.22
     déplacer
    0.22
     théorie
    0.22
     असून
    0.21
     utilisés
    0.21
     demasi
    0.21
     bruge
    0.20
     psychiat
    0.20
    POSITIVE LOGITS
    ."
    0.26
    .")
    0.26
    .`
    0.25
    .”
    0.25
    。”
    0.25
    _.
    0.24
    ."""
    0.24
    ↵↵↵↵↵↵↵↵↵↵↵
    0.24
    ".
    0.24
    。【
    0.24
    Act Density 2.415%

    No Known Activations