INDEX
Explanations
The neuron activates on occurrences of the word “inherit” (and its close morphological variants) in text.
New Auto-Interp
Negative Logits
spo
-0.08
tune
-0.08
utdown
-0.08
案
-0.07
box
-0.07
станд
-0.07
ölçü
-0.07
575
-0.07
bang
-0.07
blocked
-0.07
POSITIVE LOGITS
inherit
0.09
inherited
0.09
inher
0.07
inherit
0.07
inheritance
0.07
Inherits
0.07
heritage
0.07
HER
0.07
heirs
0.07
inequality
0.07
Activations Density 0.010%