INDEX
Explanations
The neuron activates on occurrences of the word “alarm” (or its plural/forms) in the text.
New Auto-Interp
Negative Logits
Po
-0.09
Po
-0.08
Pays
-0.07
ingt
-0.07
От
-0.07
Fut
-0.07
.PreparedStatement
-0.07
ưu
-0.07
Sept
-0.07
Kot
-0.07
POSITIVE LOGITS
alarm
0.13
alarm
0.10
Alarm
0.10
_ALARM
0.09
alarms
0.09
Alarm
0.09
alarming
0.08
horia
0.08
inflamm
0.08
卖
0.07
Activations Density 0.002%