INDEX
Explanations
Explanations and reasons
The neuron selectively activates on the question word “why,” essentially detecting when an explanation (“why”) is being prompted.
New Auto-Interp
Negative Logits
elez
-0.07
Это
-0.06
television
-0.06
GMEM
-0.06
_threads
-0.06
GREEN
-0.06
_STOP
-0.06
mission
-0.06
Second
-0.06
persuade
-0.06
POSITIVE LOGITS
hydr
0.08
.getModel
0.07
OSP
0.07
dành
0.07
ensured
0.07
_ADMIN
0.06
Ralph
0.06
principle
0.06
(fi
0.06
]=[
0.06
Activations Density 0.016%