INDEX
Explanations
punctuation and formatting elements indicative of conversational or written dialogue
foreign language identifiers
The neuron activates on tokens that begin sentences or quoted/parenthetical utterances — i.e., sentence-initial tokens.
New Auto-Interp
Negative Logits
Autoritní
-0.45
Bereits
-0.40
BibitemShut
-0.36
ſeveral
-0.36
tranſ
-0.36
abſ
-0.35
также
-0.35
stds
-0.34
subdivision
-0.33
除此之外
-0.33
POSITIVE LOGITS
Personendaten
0.88
delwed
0.82
beginnetje
0.68
httphttps
0.68
يتيمه
0.67
ReusableCell
0.66
betweenstory
0.65
verwijspagina
0.62
OFDb
0.61
AccessorTable
0.59
Activations Density 0.002%