INDEX
    Explanations

    phrases indicating self-reflection or self-assessment

    verbs completed by specific follow-ups

    New Auto-Interp
    Negative Logits
     mittler
    -0.28
    ston
    -0.25
    Bbb
    -0.24
     ord
    -0.24
    ReusableCell
    -0.23
    resources
    -0.22
     ans
    -0.22
     small
    -0.21
     ordin
    -0.21
     rank
    -0.21
    POSITIVE LOGITS
    TagMode
    0.71
     propOrder
    0.70
    AndEndTag
    0.70
    IntoConstraints
    0.69
     betweenstory
    0.69
    <unused23>
    0.68
     パンチラ
    0.68
    <unused14>
    0.68
    <unused28>
    0.68
    <unused41>
    0.68
    Act Density 2.105%

    No Known Activations