INDEX
Explanations
problems or flaws
The neuron activates on critical and evaluative language highlighting weak or flawed evidence (e.g. terms like “rejecting,” “scientific evidence,” “negative consequences,” “harm,” “reasons why,” etc.).
New Auto-Interp
Negative Logits
.getMethod
-0.07
CallableWrapper
-0.06
.cornerRadius
-0.06
Usuario
-0.06
belongsTo
-0.06
_hidden
-0.06
onPause
-0.06
ق
-0.06
tục
-0.06
.addTab
-0.06
POSITIVE LOGITS
ϊ
0.07
vítěz
0.07
QUENCY
0.07
Chance
0.07
\">"
0.07
eways
0.06
RAFT
0.06
(fs
0.06
outsiders
0.06
<Real
0.06
Activations Density 0.060%