INDEX
Explanations
strong opinions
This neuron detects negative descriptive adjectives—pejorative qualifiers like “dirty,” “strange,” or other insulting descriptors.
words and short phrases that express strong subjective evaluation or emotional emphasis (especially negative judgments).
New Auto-Interp
Negative Logits
forsk
-0.07
Former
-0.06
澤
-0.06
khi
-0.06
_fsm
-0.06
_mas
-0.06
before
-0.06
_fee
-0.06
navCtrl
-0.06
lamin
-0.06
POSITIVE LOGITS
ubb
0.07
Kiev
0.07
offspring
0.07
trứng
0.06
Treatment
0.06
Q
0.06
Counter
0.06
Views
0.06
_Show
0.06
+++
0.06
Activations Density 0.118%