INDEX
Explanations
This neuron identifies tokens related to attributing information to a source—words like “input,” “from,” “made,” or “feedback” that signal the origin of reported data or contributions.
New Auto-Interp
Negative Logits
reduces
-0.06
serpent
-0.06
clf
-0.06
suffer
-0.06
.diff
-0.06
devil
-0.06
duck
-0.06
ssize
-0.06
luluk
-0.06
очь
-0.06
POSITIVE LOGITS
inputs
0.08
input
0.08
Input
0.07
빈
0.07
выход
0.07
Solar
0.07
>\<
0.07
Fonts
0.06
FL
0.06
inputs
0.06
Activations Density 0.010%