INDEX
    Explanations

    mentions of names or identities in a text

    New Auto-Interp
    Negative Logits
     „,
    -0.96
     inder
    -0.96
     ?...
    -0.93
     effe
    -0.92
     »>
    -0.82
     §.
    -0.81
     desir
    -0.81
     uncin
    -0.81
     aen
    -0.79
     illi
    -0.79
    POSITIVE LOGITS
     nor
    1.33
    nor
    0.94
     anymore
    0.90
     whatsoever
    0.82
     neither
    0.81
     sondern
    0.80
     Nor
    0.78
    Nor
    0.75
     unless
    0.74
     except
    0.74
    Act Density 0.804%

    No Known Activations