INDEX
Explanations
code and strings
This neuron fires on the assistant’s self-referential disclaimers—particularly the “As an AI language model…” introduction and related statements of inability.
instructions related to generating or modifying statements based on user input.
New Auto-Interp
Negative Logits
Ax
-0.07
creepy
-0.07
RV
-0.07
Patient
-0.07
Cy
-0.06
"%
-0.06
Also
-0.06
sunglasses
-0.06
cy
-0.06
มเต
-0.06
POSITIVE LOGITS
λά
0.06
enze
0.06
.accessToken
0.06
/routes
0.06
Options
0.06
леж
0.06
cancelButtonTitle
0.06
.getContent
0.06
aspirations
0.06
ي
0.06
Activations Density 0.003%