INDEX
Explanations
code/programming
The neuron activates on directive or instruction words (e.g. “ONLY,” “Give,” “response”) in the user’s prompt.
New Auto-Interp
Negative Logits
jejichž
-0.06
emojis
-0.06
.dec
-0.06
adversity
-0.06
fps
-0.06
کش
-0.06
립
-0.06
================================================
-0.06
ents
-0.06
toolbar
-0.06
POSITIVE LOGITS
�
0.07
Syracuse
0.07
aku
0.06
раск
0.06
kee
0.06
portfolio
0.06
french
0.06
زیبا
0.06
strange
0.06
evac
0.06
Activations Density 0.059%