INDEX
Explanations
The neuron activates on occurrences of the substring “hurt,” as in “hurtful.”
New Auto-Interp
Negative Logits
Moff
-0.07
Availability
-0.07
miş
-0.06
lateinit
-0.06
Pal
-0.06
binaries
-0.06
mart
-0.06
criminal
-0.06
mig
-0.06
amphetamine
-0.06
POSITIVE LOGITS
($__
0.07
atetime
0.07
hurt
0.06
Ergebn
0.06
joy
0.06
_LICENSE
0.06
averages
0.06
.velocity
0.06
jours
0.06
봐
0.06
Activations Density 0.002%