INDEX
Explanations
The neuron activates on occurrences of the word “safe,” i.e. it flags mentions of something being safe.
New Auto-Interp
Negative Logits
unreachable
-0.07
.isPresent
-0.07
вб
-0.06
chipset
-0.06
nale
-0.06
نسمة
-0.06
ponential
-0.06
♪
-0.06
nvarchar
-0.06
EmptyEntries
-0.06
POSITIVE LOGITS
safe
0.07
ayi
0.07
risk
0.07
risky
0.07
presidente
0.06
blindly
0.06
Volk
0.06
тор
0.06
Risk
0.06
Georgia
0.06
Activations Density 0.016%