INDEX
Explanations
words related to directions and positions
terms related to gender and directional or positional descriptors
New Auto-Interp
Negative Logits
gone
-0.81
jan
-0.78
atl
-0.78
Joy
-0.74
stros
-0.74
ogi
-0.72
mud
-0.72
afety
-0.71
atis
-0.71
tyard
-0.70
POSITIVE LOGITS
alike
1.43
respectively
1.31
depending
1.14
versions
0.91
striped
0.86
depending
0.82
administrations
0.81
coasts
0.80
grades
0.79
editions
0.78
Activations Density 0.292%