INDEX
Explanations
punctuation
The neuron fires on tokens used to praise someone’s character or professional virtues—e.g. words like “responsible,” “reliable,” “integrity,” “compassion,” and similar descriptors of trustworthiness and professionalism.
New Auto-Interp
Negative Logits
otine
-0.07
grandchildren
-0.06
optimizer
-0.06
审
-0.06
оч
-0.06
hen
-0.06
shouldReceive
-0.06
垂
-0.06
<article
-0.06
isOpen
-0.06
POSITIVE LOGITS
boa
0.07
.General
0.07
Only
0.07
الى
0.06
/ca
0.06
cerr
0.06
hodiny
0.06
conveying
0.06
หาก
0.06
dims
0.06
Activations Density 0.089%