INDEX
Explanations
phrases or concepts related to curiosity or questions
structural tags and markup
New Auto-Interp
Negative Logits
AndEndTag
-1.15
myſelf
-1.14
ロウィン
-1.03
[@BOS@]
-1.02
<unused41>
-1.02
<unused16>
-1.02
<unused28>
-1.02
<unused23>
-1.02
<unused8>
-1.02
<unused14>
-1.02
POSITIVE LOGITS
↵↵
0.52
1
0.47
0.44
↵
0.41
0.40
2
0.39
+
0.39
<h1>
0.38
.
0.38
0.38
Activations Density 0.069%