INDEX
    Explanations

    references to individuals in the text

    New Auto-Interp
    Negative Logits
     precis
    -0.16
    ts
    -0.15
    gaard
    -0.15
    rana
    -0.15
    izador
    -0.15
    ted
    -0.15
    lef
    -0.14
    ogn
    -0.14
    usercontent
    -0.14
    stadt
    -0.14
    POSITIVE LOGITS
    nels
    0.33
    ification
    0.27
    hood
    0.27
    nel
    0.27
    ified
    0.26
    nage
    0.26
    /people
    0.25
    age
    0.25
    ae
    0.24
    nal
    0.24
    Act Density 0.031%

    No Known Activations