INDEX
Explanations
The neuron fires on legal “release” terminology—words like release, releases, discharges, and related settlement‐agreement language.
New Auto-Interp
Negative Logits
plode
-0.06
counterfeit
-0.06
-esteem
-0.06
igate
-0.06
KeyPressed
-0.06
effort
-0.06
Her
-0.06
hammer
-0.06
Tuesday
-0.06
Moodle
-0.06
POSITIVE LOGITS
디시
0.07
ือก
0.07
hes
0.07
┃
0.07
Ran
0.06
ιαν
0.06
suspension
0.06
actions
0.06
_delivery
0.06
üş
0.06
Activations Density 0.003%