INDEX
Explanations
This neuron detects words that express sincerity or genuine sentiment (e.g., “sincere,” “heartfelt,” “genuinely”).
New Auto-Interp
Negative Logits
.'
-0.07
Ideally
-0.06
loggedIn
-0.06
Def
-0.06
ontology
-0.06
enumerated
-0.06
orderBy
-0.06
August
-0.06
.ev
-0.06
deaths
-0.06
POSITIVE LOGITS
sincere
0.14
sincerely
0.11
sincerity
0.09
誠
0.09
sincer
0.08
Narc
0.08
bếp
0.07
heartfelt
0.07
서
0.07
默
0.07
Activations Density 0.005%