INDEX
Explanations
The neuron selectively activates on the token “confidant” (as in the user’s “evil trusted confidant” role assignment).
New Auto-Interp
Negative Logits
synchronized
-0.07
rulers
-0.07
↵
-0.07
otherButtonTitles
-0.06
feeding
-0.06
}{$-0.06
entering
-0.06
↵
-0.06
allegiance
-0.06
stem
-0.06
POSITIVE LOGITS
ці
0.07
090
0.07
台
0.07
Lawyer
0.07
Guid
0.07
oj
0.06
워크
0.06
vé
0.06
bt
0.06
hodně
0.06
Activations Density 0.002%