INDEX
Explanations
injuries
The main thing this neuron does is spot words that describe physical harm or injury.
New Auto-Interp
Negative Logits
-license
-0.07
简单
-0.06
android
-0.06
crear
-0.06
.Linear
-0.06
_set
-0.06
cả
-0.06
RAM
-0.06
Kara
-0.06
lovers
-0.06
POSITIVE LOGITS
derivative
0.07
showcase
0.07
atical
0.07
تاریخی
0.06
asto
0.06
,当
0.06
intelligence
0.06
ocrats
0.06
ical
0.06
ivative
0.06
Activations Density 0.024%