INDEX
    Explanations

    phrases related to instructions and guidelines

    New Auto-Interp
    Negative Logits
    CloseOperation
    -0.98
    <unused41>
    -0.92
    <unused14>
    -0.92
    [@BOS@]
    -0.92
    <unused28>
    -0.92
    <unused68>
    -0.92
    <unused8>
    -0.92
    <unused51>
    -0.92
    <unused3>
    -0.91
    <unused16>
    -0.91
    POSITIVE LOGITS
     antaranya
    0.40
     Notwendigkeit
    0.40
     Gründe
    0.39
     berikutnya
    0.38
     keduanya
    0.38
     solchen
    0.38
     palsu
    0.38
     vieles
    0.37
     rumahnya
    0.37
     dalamnya
    0.36
    Act Density 1.901%

    No Known Activations