INDEX
Explanations
The neuron selectively activates on words that occur at the start of sentences or new paragraphs—i.e. sentence-initial tokens.
New Auto-Interp
Negative Logits
Discussions
-0.07
erse
-0.07
ue
-0.07
FOUND
-0.07
อำนวย
-0.06
ょう
-0.06
readFile
-0.06
FromFile
-0.06
exper
-0.06
Tap
-0.06
POSITIVE LOGITS
流
0.07
proprietor
0.06
figur
0.06
七
0.06
Disclaimer
0.06
>{{$0.06
Humanities
0.06
황
0.06
Cs
0.06
([$
0.06
Activations Density 0.466%