INDEX
    Explanations

    mentions of locations or events related to a specific context or topic

    the occurrence of the word "the" in different contexts

    New Auto-Interp
    Negative Logits
    ipolar
    -0.74
    interrupted
    -0.72
     eleph
    -0.65
    lessly
    -0.64
    usher
    -0.63
     pressures
    -0.63
    iqueness
    -0.61
    antha
    -0.61
    etheless
    -0.61
    olicy
    -0.59
    POSITIVE LOGITS
    brate
    1.43
    brates
    1.32
    ller
    1.18
    llers
    1.13
    achers
    1.12
    achable
    1.10
    levision
    1.09
    legraph
    1.09
    aching
    1.03
    legram
    1.02
    Act Density 0.017%

    No Known Activations