INDEX
Explanations
The neuron activates on inclusive “we all” constructions—i.e. instances of the phrase “we all.”
New Auto-Interp
Negative Logits
rehab
-0.07
appare
-0.07
converse
-0.06
IntoConstraints
-0.06
gsi
-0.06
storytelling
-0.06
dress
-0.06
turf
-0.06
Orig
-0.06
mixin
-0.06
POSITIVE LOGITS
meslek
0.07
elem
0.06
/T
0.06
incap
0.06
kel
0.06
(Locale
0.06
'.',
0.06
ausal
0.06
나요
0.06
สม
0.06
Activations Density 0.012%