INDEX
Explanations
repel/repulsion
The neuron fires on occurrences of words denoting repulsion (e.g. “repel,” “repulsion”).
New Auto-Interp
Negative Logits
↵ ↵
-0.07
開発
-0.07
ngừng
-0.07
che
-0.06
massively
-0.06
překlad
-0.06
banners
-0.06
----------------------------------------------------------------------↵
-0.06
ぐ
-0.06
sight
-0.06
POSITIVE LOGITS
uber
0.07
Pompe
0.06
?id
0.06
.exc
0.06
.password
0.06
�
0.06
(rhs
0.06
[out
0.06
alcan
0.06
量
0.06
Activations Density 0.003%