INDEX
Explanations
quantitative/directional relationships
The neuron fires on the literal token “positive” when labeling text sentiment.
New Auto-Interp
Negative Logits
�
-0.07
cancers
-0.07
μέ
-0.07
від
-0.07
ui
-0.06
(bit
-0.06
'%
-0.06
났
-0.06
کامپی
-0.06
будто
-0.06
POSITIVE LOGITS
Cosmos
0.06
BODY
0.06
inefficient
0.06
-security
0.06
domaine
0.06
entreprises
0.06
church
0.06
Cliente
0.06
_Game
0.06
ğer
0.06
Activations Density 0.001%