INDEX
Explanations
caring, family
This neuron detects words expressing positive actions or endorsements (e.g., giving, recommending, buying, caring, generous).
New Auto-Interp
Negative Logits
809
-0.07
rugs
-0.07
amik
-0.06
_today
-0.06
essel
-0.06
fluffy
-0.06
листь
-0.06
フ
-0.06
underwear
-0.06
_ASSERT
-0.06
POSITIVE LOGITS
moral
0.06
-div
0.06
monet
0.06
tone
0.06
biblical
0.06
perv
0.06
imperson
0.06
spor
0.06
staple
0.06
PW
0.06
Activations Density 0.356%