INDEX
Explanations
writing and fun
This neuron detects adjectives and adverbs that specify the assistant’s desired tone or style (e.g., “funny,” “edgy,” “like”).
New Auto-Interp
Negative Logits
copied
-0.06
xfe
-0.06
Cleans
-0.06
_material
-0.06
opic
-0.05
quirer
-0.05
obten
-0.05
Гол
-0.05
Kiş
-0.05
enville
-0.05
POSITIVE LOGITS
sea
0.08
những
0.07
hedge
0.07
是一个
0.07
.states
0.06
Điều
0.06
enim
0.06
*'
0.06
زمینه
0.06
July
0.06
Activations Density 0.003%