INDEX
Explanations
The neuron selectively detects instances of the phrase “Because of.”
New Auto-Interp
Negative Logits
gossip
-0.07
芸
-0.07
ificar
-0.07
participate
-0.06
kj
-0.06
corrective
-0.06
сия
-0.06
landfill
-0.06
surf
-0.06
sucht
-0.06
POSITIVE LOGITS
ATO
0.07
ideos
0.06
Ven
0.06
$(".0.06
ato
0.06
dir
0.06
(","0.06
_BLUE
0.06
actor
0.06
caused
0.05
Activations Density 0.022%