INDEX
    Explanations

    references to the city of Paris

    New Auto-Interp
    Negative Logits
    tir
    -0.16
     recru
    -0.16
    swire
    -0.15
    tor
    -0.15
    orrh
    -0.15
    vetica
    -0.15
    halt
    -0.15
     célib
    -0.14
    .inflate
    -0.14
    frau
    -0.14
    POSITIVE LOGITS
    ian
    0.35
    ienne
    0.28
    ians
    0.27
    ien
    0.27
     Hilton
    0.24
    IAN
    0.23
    iens
    0.22
    cope
    0.20
    ién
    0.19
     Match
    0.18
    Act Density 0.017%

    No Known Activations