INDEX
    Explanations

    names of people or places

    letters or characters that appear frequently in the text

    New Auto-Interp
    Negative Logits
     censored
    -0.77
     coli
    -0.70
     destro
    -0.70
     hairs
    -0.70
     envy
    -0.70
     kernels
    -0.68
     mosqu
    -0.68
     proxies
    -0.67
     mosaic
    -0.67
     prol
    -0.67
    POSITIVE LOGITS
    idd
    1.04
    afer
    0.99
    inn
    0.99
    ady
    0.99
    urd
    0.98
    itz
    0.98
    oor
    0.97
    acker
    0.97
    alla
    0.96
    ü
    0.95
    Act Density 0.160%

    No Known Activations