INDEX
Explanations
The neuron activates on words expressing a promise or commitment (e.g. “promise”).
New Auto-Interp
Negative Logits
Frog
-0.07
мала
-0.06
intend
-0.06
blanket
-0.06
Titulo
-0.06
nut
-0.06
supper
-0.06
juices
-0.06
.priority
-0.06
_unit
-0.06
POSITIVE LOGITS
ycled
0.07
py
0.07
/** ↵
0.07
${({0.06
odia
0.06
ổng
0.06
onestly
0.06
!important
0.06
magical
0.06
än
0.06
Activations Density 0.009%