INDEX
Explanations
The main thing this neuron does is find specific triggering words that indicate the start of a new section or topic
instances of conversational or narrative transitions
New Auto-Interp
Negative Logits
Mate
-0.63
commun
-0.62
',
-0.60
transc
-0.60
'.
-0.59
lit
-0.57
ILCS
-0.55
VIDE
-0.55
virgin
-0.54
cutter
-0.54
POSITIVE LOGITS
Else
0.83
:(
0.76
hesda
0.75
mosp
0.73
APS
0.72
Weak
0.71
Jew
0.67
Pr
0.66
agar
0.66
auga
0.66
Activations Density 0.147%