INDEX
Explanations
simplicity
The neuron fires on editorial/discourse phrases that introduce simplifying assumptions (e.g. “for simplicity… we assume…”).
New Auto-Interp
Negative Logits
Dados
-0.07
věcí
-0.07
ENTIAL
-0.06
Dup
-0.06
據
-0.06
ичного
-0.06
vzdělávání
-0.06
-to
-0.06
धर
-0.06
촌
-0.06
POSITIVE LOGITS
друз
0.07
artin
0.07
miles
0.07
".$_
0.06
&,
0.06
Sakura
0.06
recent
0.06
(samples
0.06
Expl
0.06
委
0.06
Activations Density 0.017%