INDEX
    Explanations

    personal pronouns and verbs indicating agency or existence

    New Auto-Interp
    Negative Logits
     Gegenteil
    -0.33
     bocetos
    -0.33
     Wochenende
    -0.32
     AssemblyCulture
    -0.32
     defaultstate
    -0.31
     Einwilligung
    -0.31
    InteropServices
    -0.30
     Seguridad
    -0.30
     pungkasnya
    -0.30
    Saluti
    -0.30
    POSITIVE LOGITS
     zuſammen
    0.80
     zwiſchen
    0.77
    <unused79>
    0.76
    <unused28>
    0.76
    <unused8>
    0.76
    [@BOS@]
    0.76
    <unused14>
    0.75
    <unused23>
    0.75
    <unused3>
    0.75
    <pad>
    0.75
    Act Density 0.072%

    No Known Activations