INDEX
Explanations
phrases indicating self-reflection or self-assessment
verbs completed by specific follow-ups
New Auto-Interp
Negative Logits
mittler
-0.28
ston
-0.25
Bbb
-0.24
ord
-0.24
ReusableCell
-0.23
resources
-0.22
ans
-0.22
small
-0.21
ordin
-0.21
rank
-0.21
POSITIVE LOGITS
TagMode
0.71
propOrder
0.70
AndEndTag
0.70
IntoConstraints
0.69
betweenstory
0.69
<unused23>
0.68
パンチラ
0.68
<unused14>
0.68
<unused28>
0.68
<unused41>
0.68
Activations Density 2.105%