INDEX
Explanations
instances of strong emotional responses or tensions in discussions
chat messages and user tags
This neuron detects turn boundary tokens—i.e., the end-of-turn / conversation boundary marker.
New Auto-Interp
Negative Logits
Administrativna
-0.73
otomatig
-0.70
хьтан
-0.64
нгред
-0.62
uxxxx
-0.62
<unused41>
-0.60
ſſung
-0.60
<unused8>
-0.60
<unused3>
-0.60
<pad>
-0.60
POSITIVE LOGITS
It
0.44
I
0.42
It
0.41
archiviato
0.40
it
0.38
There
0.38
He
0.37
pretty
0.36
I
0.36
That
0.36
Activations Density 0.034%