INDEX
Explanations
The neuron activates on subjective, evaluative adjectives and adverbs (e.g. great, significant, urgent) that express emphasis or strong positive appraisal.
New Auto-Interp
Negative Logits
ilst
-0.07
crawling
-0.06
anonymously
-0.06
.selector
-0.06
HasBeenSet
-0.06
asury
-0.06
luckily
-0.06
органов
-0.06
illow
-0.06
standing
-0.06
POSITIVE LOGITS
门
0.07
همچ
0.07
图
0.07
strugg
0.07
Music
0.06
题
0.06
عراق
0.06
唱
0.06
_<
0.06
�
0.06
Activations Density 0.090%