INDEX
Explanations
Hypothetical situations
The neuron fires on words and phrases that signal hypothetical risk or potential negative outcomes (e.g. “could,” “theoretically,” “detrimental,” “dangerous”).
New Auto-Interp
Negative Logits
Sherlock
-0.07
обще
-0.07
Labels
-0.07
Reconstruction
-0.07
specifications
-0.07
character
-0.07
xford
-0.06
={$-0.06
explanation
-0.06
_selection
-0.06
POSITIVE LOGITS
вий
0.06
kil
0.06
Async
0.06
ΑΣ
0.06
Matrix
0.06
nej
0.06
genesis
0.06
_Event
0.06
amigos
0.06
Mes
0.05
Activations Density 0.106%