INDEX
Explanations
The neuron flags adverbial “-ly” modifiers—especially emotion/manner adverbs like “excitedly” in stage directions.
New Auto-Interp
Negative Logits
risk
-0.07
Compensation
-0.06
OSP
-0.06
Cost
-0.06
Liqu
-0.06
golf
-0.06
similarity
-0.06
Osman
-0.06
Anton
-0.06
Logic
-0.06
POSITIVE LOGITS
excited
0.09
ään
0.07
بسی
0.07
herkes
0.07
patriotic
0.07
座
0.07
massa
0.07
hiç
0.07
!
0.07
thrilled
0.06
Activations Density 0.033%