INDEX
Explanations
qualifying words
This neuron strongly activates on adverbs—words ending in “-ly.”
New Auto-Interp
Negative Logits
select
-0.07
harassment
-0.07
climbed
-0.07
'.↵
-0.07
Dresses
-0.07
Discussion
-0.07
-about
-0.07
Send
-0.06
MIC
-0.06
|.↵
-0.06
POSITIVE LOGITS
Horizon
0.06
델
0.06
Ticaret
0.06
Namespace
0.06
्छ
0.06
volume
0.06
Eug
0.06
:"#
0.06
ivant
0.06
specialize
0.06
Activations Density 0.111%