INDEX
    Explanations

    the presence of the word "there" indicating location or existence

    New Auto-Interp
    Negative Logits
    s
    -0.18
    ville
    -0.18
    irma
    -0.17
    ss
    -0.17
    ette
    -0.17
    ringe
    -0.16
    ruit
    -0.16
    richt
    -0.16
    lore
    -0.15
    sss
    -0.15
    POSITIVE LOGITS
    abouts
    0.22
    zelf
    0.17
    iner
    0.16
    ched
    0.16
    -même
    0.15
    yonel
    0.15
    ourcem
    0.15
    after
    0.15
    lef
    0.15
    unto
    0.14
    Act Density 0.056%

    No Known Activations