INDEX
    Explanations

    mentions of the city of Paris

    New Auto-Interp
    Negative Logits
    uilt
    -0.83
    ITH
    -0.79
    estern
    -0.74
    arijuana
    -0.69
    pta
    -0.68
    rha
    -0.68
    ocument
    -0.67
    ramid
    -0.67
    isSpecialOrderable
    -0.67
    atcher
    -0.66
    POSITIVE LOGITS
     Hilton
    1.19
    ienne
    1.03
    ian
    1.00
    ians
    0.94
    furt
    0.88
     Mé
    0.86
     Attacks
    0.84
    iens
    0.82
    bourg
    0.80
    etta
    0.79
    Act Density 0.007%

    No Known Activations