INDEX
Explanations
The neuron activates on the word “serious,” detecting instances of that severity-indicating adjective.
New Auto-Interp
Negative Logits
depleted
-0.08
lact
-0.07
canine
-0.07
Ten
-0.06
oat
-0.06
bohydr
-0.06
ANA
-0.06
upport
-0.06
Oct
-0.06
Ten
-0.06
POSITIVE LOGITS
serious
0.15
seriously
0.12
seriousness
0.10
Serious
0.09
Seriously
0.09
심
0.09
ris
0.09
serious
0.08
grave
0.08
karar
0.08
Activations Density 0.008%