INDEX
Explanations
The neuron fires on words that express apprehension or risk (e.g. “risk,” “afraid,” etc.).
New Auto-Interp
Negative Logits
exem
-0.07
něl
-0.07
spiked
-0.06
Free
-0.06
seize
-0.06
(sz
-0.06
rejects
-0.06
safe
-0.06
nailed
-0.06
denounced
-0.06
POSITIVE LOGITS
convertView
0.07
Corinth
0.06
/l
0.06
:d
0.06
={},0.06
of
0.06
izzie
0.06
.qml
0.06
unpleasant
0.06
JI
0.06
Activations Density 0.013%