INDEX
Explanations
success or failure
This neuron activates on discourse‐structuring and modal/connective tokens (e.g. “if,” “however,” “instead,” “would,” “might,” “in conclusion,” etc.), essentially picking up argument flow markers.
New Auto-Interp
Negative Logits
Package
-0.06
Şubat
-0.06
cuối
-0.06
Sweden
-0.06
-archive
-0.06
iya
-0.06
Feed
-0.06
partisan
-0.06
culture
-0.06
смесь
-0.06
POSITIVE LOGITS
Dealer
0.07
gil
0.07
Alg
0.06
configparser
0.06
insp
0.06
比
0.06
内
0.06
Malay
0.06
)initWithFrame
0.06
Ї
0.06
Activations Density 0.042%