INDEX
Explanations
conclusions or summary statements, typically marked by periods
citations and foreign words
This neuron detects sentence boundaries, firing strongly at the start-of-sentence token and at sentence-final punctuation.
New Auto-Interp
Negative Logits
↵↵
-0.76
↵
-0.66
↵↵↵
-0.62
↵↵↵↵
-0.60
↵↵↵↵↵↵
-0.54
[...]
-0.54
↵↵↵↵↵↵↵
-0.54
[…]
-0.52
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
-0.52
【
-0.51
POSITIVE LOGITS
bibfield
0.59
queſta
0.52
Bewußt
0.52
ofür
0.51
antaranya
0.51
człowie
0.50
bénévol
0.48
biß
0.47
0.47
tiérrez
0.47
Activations Density 2.333%