INDEX
Explanations
This neuron activates on the adjective “severe,” marking mentions of high disease or condition severity.
New Auto-Interp
Negative Logits
Fake
-0.07
Strap
-0.07
Outcome
-0.07
jac
-0.06
Ey
-0.06
Chop
-0.06
tryside
-0.06
salad
-0.06
igraph
-0.06
879
-0.06
POSITIVE LOGITS
.scalatest
0.07
severe
0.07
možnosti
0.07
icago
0.06
derin
0.06
نبود
0.06
fillable
0.06
`(
0.06
рассказ
0.06
ђ
0.06
Activations Density 0.023%