INDEX
    Explanations

    mentions of directional placements or spatial relationships

    New Auto-Interp
    Negative Logits
    atis
    -0.16
    rss
    -0.15
    ALSE
    -0.15
    æk
    -0.14
    Ī
    -0.14
    rink
    -0.14
    .son
    -0.14
    ouve
    -0.14
    ato
    -0.14
    rf
    -0.14
    POSITIVE LOGITS
    Ìĥ
    0.15
    linger
    0.14
    imers
    0.14
     jmé
    0.14
    .amazonaws
    0.14
    pson
    0.14
    istrovstvÃŃ
    0.14
    .matcher
    0.14
     marg
    0.13
    743
    0.13
    Act Density 0.012%

    No Known Activations