INDEX
    Explanations

    proper nouns, specifically names of individuals in news reports or incidents

    instances of the preposition "of"

    New Auto-Interp
    Negative Logits
     functioning
    -0.77
    attribute
    -0.72
     disse
    -0.72
     evaluates
    -0.68
     derog
    -0.68
     dictate
    -0.68
     explan
    -0.66
     vulner
    -0.66
    pard
    -0.66
     disag
    -0.65
    POSITIVE LOGITS
     Anaheim
    1.00
     Lancaster
    0.98
     Queens
    0.98
     Stamford
    0.97
     Bethlehem
    0.95
     Wilmington
    0.94
     Calgary
    0.94
     Rochester
    0.94
     Omaha
    0.93
     Auckland
    0.93
    Act Density 0.070%

    No Known Activations