INDEX
Explanations
The neuron activates specifically on the tokens “leaned” plus “in,” i.e. it detects occurrences of the phrase “leaned in.”
New Auto-Interp
Negative Logits
Namen
-0.08
pageInfo
-0.07
üssen
-0.07
时间
-0.07
pective
-0.06
inspace
-0.06
/day
-0.06
mach
-0.06
chatte
-0.06
ayıp
-0.06
POSITIVE LOGITS
leaned
0.12
leaning
0.10
leans
0.07
Κου
0.07
jured
0.07
driven
0.06
efined
0.06
Jean
0.06
Glenn
0.06
XT
0.06
Activations Density 0.003%