INDEX
    Explanations

    mentions of specific locations, with a focus on cities

    mentions of specific geographic locations or entities

    New Auto-Interp
    Negative Logits
    essee
    -0.80
     barriers
    -0.71
    ulhu
    -0.67
    IBLE
    -0.66
    istics
    -0.63
     bully
    -0.63
     Jericho
    -0.62
     Arkham
    -0.62
     arche
    -0.61
    ural
    -0.60
    POSITIVE LOGITS
    stre
    0.93
    chn
    0.88
    loo
    0.87
    bye
    0.85
    zel
    0.85
    tsky
    0.83
    bies
    0.83
    LM
    0.83
    nel
    0.82
    let
    0.82
    Act Density 0.049%

    No Known Activations