INDEX
    Explanations

    mentions of residences or living spaces

    New Auto-Interp
    Negative Logits
    eron
    -0.15
    tober
    -0.15
    aver
    -0.15
    450
    -0.15
    ish
    -0.15
    rowing
    -0.14
     Lah
    -0.14
    ero
    -0.14
    ethod
    -0.14
    fer
    -0.14
    POSITIVE LOGITS
    ally
    0.16
    infeld
    0.16
    вок
    0.15
    é¡Ķ
    0.15
    conti
    0.14
    earn
    0.14
    ÙħÙĬÙĦ
    0.14
    oldemort
    0.14
    ieves
    0.14
    çĦ
    0.14
    Act Density 0.011%

    No Known Activations