INDEX
    Explanations

    special characters and formatting indicators

    New Auto-Interp
    Negative Logits
    rungsseite
    -1.10
     itſelf
    -1.02
     myſelf
    -0.93
    homonymie
    -0.92
     متعلقه
    -0.91
     ſche
    -0.87
     againſt
    -0.86
     houſe
    -0.85
    '},
    
    -0.84
     ―――――
    -0.84
    POSITIVE LOGITS
    endpush
    0.58
    DoubleQuotes
    0.55
     Big
    0.55
     big
    0.54
     second
    0.54
     we
    0.52
    <eos>
    0.52
     di
    0.51
     (
    0.51
    uot
    0.51
    Act Density 0.224%

    No Known Activations