INDEX
Explanations
The neuron strongly activates on language about assumed consent—specifically the phrase “assume consent is granted.”
New Auto-Interp
Negative Logits
เซ
-0.07
soils
-0.07
(*)
-0.07
oned
-0.06
名字
-0.06
jsonObj
-0.06
Σα
-0.06
redistributed
-0.06
/buttons
-0.06
('?-0.06
POSITIVE LOGITS
套
0.07
layout
0.07
estamos
0.07
asthma
0.06
backdrop
0.06
density
0.06
waterfall
0.06
Damn
0.06
prov
0.06
Riy
0.06
Activations Density 0.002%