INDEX
    Explanations

    words related to locations or geographical areas

    references to personal experience or identity

    New Auto-Interp
    Negative Logits
    kefeller
    -0.75
    keyes
    -0.74
    ernels
    -0.69
    paralle
    -0.69
    Reviewed
    -0.66
    rama
    -0.64
    iosity
    -0.64
    olor
    -0.63
    rieved
    -0.61
    hips
    -0.61
    POSITIVE LOGITS
    anwhile
    1.30
    asure
    1.26
    lda
    1.11
    zzo
    1.05
    ister
    0.99
    adows
    0.99
    eting
    0.97
    leon
    0.96
    isters
    0.96
    adow
    0.95
    Act Density 0.017%

    No Known Activations