INDEX
Explanations
code/technical documentation
The neuron is primarily triggered by occurrences of the short function word “in.”
New Auto-Interp
Negative Logits
ofs
-0.07
manager
-0.07
nelle
-0.07
dad
-0.07
friend
-0.07
bitterness
-0.06
Tester
-0.06
.oper
-0.06
ackages
-0.06
))):↵
-0.06
POSITIVE LOGITS
ีช
0.07
刑
0.06
يمكن
0.06
افع
0.06
urovision
0.06
'_
0.06
вероят
0.06
huge
0.06
emales
0.06
ेव
0.06
Activations Density 0.120%