INDEX
Explanations
the start of new sections or segments indicated by specific tokens
Q followed by questions
New Auto-Interp
Negative Logits
Efq
-1.41
purpoſe
-1.32
pleaſure
-1.28
myſelf
-1.25
raiſ
-1.25
houſe
-1.25
Reſ
-1.23
Jefus
-1.23
Majefty
-1.23
Anſ
-1.22
POSITIVE LOGITS
<eos>
0.71
se
0.56
↵↵
0.56
la
0.52
<strong>
0.51
m
0.50
za
0.49
...
0.49
。
0.48
0.48
Activations Density 0.009%