INDEX
Explanations
Chat/forum snippets
This neuron detects the assistant’s self-referential “I” (first-person) in its own messages.
New Auto-Interp
Negative Logits
артам
-0.07
money
-0.07
地域
-0.06
biopsy
-0.06
Release
-0.06
Indiana
-0.06
ج
-0.06
Regions
-0.06
neue
-0.06
release
-0.06
POSITIVE LOGITS
linux
0.06
Thus
0.06
ват
0.06
정신
0.06
Undefined
0.06
บ
0.06
.dst
0.06
vale
0.06
ọng
0.05
аниц
0.05
Activations Density 0.063%