INDEX
    Explanations

    repeated phrases or descriptors referring to people, institutions, or events

    New Auto-Interp
    Negative Logits
    uild
    -0.15
    uent
    -0.15
    ideon
    -0.14
    nze
    -0.14
    yer
    -0.14
    yb
    -0.14
    suming
    -0.14
    zens
    -0.14
    abis
    -0.13
    LError
    -0.13
    POSITIVE LOGITS
     late
    0.35
    late
    0.30
     man
    0.27
     incom
    0.26
     Late
    0.26
     estim
    0.26
     son
    0.23
     irre
    0.22
    Late
    0.21
     ever
    0.20
    Act Density 0.223%

    No Known Activations