INDEX
Explanations
The neuron responds to uncommon, domain‐specific or technical words (i.e. rarer vocabulary) rather than everyday terms.
New Auto-Interp
Negative Logits
ケ
-0.07
kontakt
-0.06
/sys
-0.06
OutOfBoundsException
-0.06
Tell
-0.06
surv
-0.06
turn
-0.06
.Entity
-0.06
件
-0.06
Shame
-0.06
POSITIVE LOGITS
Exercises
0.07
antennas
0.07
emergencies
0.06
ebony
0.06
grads
0.06
VICES
0.06
ENCIL
0.06
trag
0.06
magn
0.06
chlor
0.06
Activations Density 0.676%