INDEX
Explanations
The neuron fires on occurrences of religious‐style worship or praise verbs (e.g. “worship,” “glorification”).
New Auto-Interp
Negative Logits
deltas
-0.08
Madd
-0.07
case
-0.07
Ein
-0.07
Finn
-0.07
eased
-0.07
-release
-0.07
(click
-0.06
dělen
-0.06
۱۰
-0.06
POSITIVE LOGITS
worship
0.14
Worship
0.12
worsh
0.09
hab
0.08
拜
0.07
hip
0.07
HIP
0.07
uner
0.07
země
0.07
reverence
0.07
Activations Density 0.003%