INDEX
Explanations
This neuron activates on sentence-initial or clause-linking discourse markers (e.g. “But,” “And,” “Although”) rather than content words.
New Auto-Interp
Negative Logits
_preferences
-0.07
interpolation
-0.07
.GONE
-0.06
십
-0.06
.splice
-0.06
connected
-0.06
Pey
-0.06
appliance
-0.06
Colorado
-0.06
Penny
-0.06
POSITIVE LOGITS
desperately
0.07
INTERN
0.07
mn
0.07
yerleş
0.06
ита
0.06
اده
0.06
وي
0.06
dd
0.06
L
0.06
tất
0.06
Activations Density 0.474%