INDEX
Explanations
The neuron consistently lights up on tokens that are part of the “Document:” block itself—that is, it marks the actual source‐text lines after the “Document:” header.
dialogue involving requests for assistance and responses related to performance or well-being.
New Auto-Interp
Negative Logits
Yellow
-0.08
Warnings
-0.07
кет
-0.07
ůž
-0.06
Yellow
-0.06
ulpt
-0.06
opioid
-0.06
Seahawks
-0.06
yellow
-0.06
Gupta
-0.06
POSITIVE LOGITS
:::::::::::
0.07
ERA
0.07
/arm
0.07
.Empty
0.07
sta
0.07
io
0.06
]int
0.06
limits
0.06
。
0.06
.setInt
0.06
Activations Density 0.014%