INDEX
Explanations
This neuron activates on the names of the classification categories (e.g. “Informative/Educational,” “Shock/Disgust/Fear based,” “Personal stories/statements,” “Advocacy,” etc.) in the prompt.
New Auto-Interp
Negative Logits
groceries
-0.07
.business
-0.07
minecraft
-0.07
anlayış
-0.06
Nicholson
-0.06
_department
-0.06
�
-0.06
MaxLength
-0.06
راد
-0.06
�
-0.06
POSITIVE LOGITS
manifest
0.07
=.
0.06
применя
0.06
Welfare
0.06
serde
0.06
важ
0.06
Display
0.06
plead
0.06
contamination
0.06
spawn
0.06
Activations Density 0.053%