INDEX
Explanations
The neuron activates on mentions of romantic or sexual scandals—particularly references to “affair” and associated context.
New Auto-Interp
Negative Logits
قابلیت
-0.07
fees
-0.07
ilaç
-0.07
Es
-0.06
Peace
-0.06
奇
-0.06
imizde
-0.06
soon
-0.06
ساس
-0.06
Ao
-0.06
POSITIVE LOGITS
(relative
0.06
admins
0.06
�
0.06
picked
0.06
exist
0.06
Rica
0.06
ijk
0.06
comfy
0.06
pretty
0.06
_coordinate
0.06
Activations Density 0.024%