INDEX
    Explanations

    references to specific geographical locations, particularly cities and countries

    New Auto-Interp
    Negative Logits
    enegger
    -0.82
    WARD
    -0.72
    acet
    -0.70
    erer
    -0.69
    ointed
    -0.69
    ORGE
    -0.68
    actor
    -0.66
    ODUCT
    -0.66
    emin
    -0.66
    Connell
    -0.63
    POSITIVE LOGITS
    ian
    0.95
    ians
    0.90
    istan
    0.84
    hips
    0.82
    iang
    0.77
     Rapids
    0.77
    Ñĭ
    0.73
    etsk
    0.73
    ansas
    0.73
    iana
    0.73
    Act Density 0.007%

    No Known Activations