INDEX
Explanations
danger or peril
The neuron flags words that denote risk or endangerment (e.g., “jeopardize,” “risk”).
New Auto-Interp
Negative Logits
tantra
-0.06
surveyed
-0.06
suburbs
-0.06
rectangles
-0.06
bride
-0.06
від
-0.06
hive
-0.06
fuse
-0.06
(layout
-0.06
ProgressBar
-0.06
POSITIVE LOGITS
jeopard
0.09
peril
0.08
(dep
0.07
!↵
0.07
risking
0.07
LOC
0.07
)">↵
0.07
"',↵
0.07
GPUs
0.06
ImplOptions
0.06
Activations Density 0.006%