INDEX
    Explanations

    words related to directions and positions

    terms related to gender and directional or positional descriptors

    New Auto-Interp
    Negative Logits
    gone
    -0.81
    jan
    -0.78
    atl
    -0.78
    Joy
    -0.74
    stros
    -0.74
    ogi
    -0.72
    mud
    -0.72
    afety
    -0.71
    atis
    -0.71
    tyard
    -0.70
    POSITIVE LOGITS
     alike
    1.43
     respectively
    1.31
     depending
    1.14
     versions
    0.91
     striped
    0.86
    depending
    0.82
     administrations
    0.81
     coasts
    0.80
     grades
    0.79
     editions
    0.78
    Act Density 0.292%

    No Known Activations