INDEX
    Explanations

    words related to the United States or its institutions

    references to the United States

    New Auto-Interp
    Negative Logits
     STATS
    -0.68
    theless
    -0.66
     hairs
    -0.58
     simmer
    -0.58
    adobe
    -0.57
     foreseeable
    -0.57
     ker
    -0.57
     unpre
    -0.56
     organising
    -0.56
     KP
    -0.56
    POSITIVE LOGITS
    .,
    1.46
    .?
    1.24
    .;
    1.14
    .:
    1.13
    .—
    1.06
    .,"
    1.06
    .-
    1.04
    .$
    1.04
    ./
    1.00
    .–
    0.93
    Act Density 0.049%

    No Known Activations