INDEX
    Explanations

    references to separation or distance

    New Auto-Interp
    Negative Logits
    frauen
    -0.16
    aggio
    -0.16
     lại
    -0.16
    iams
    -0.16
    odore
    -0.16
    esses
    -0.15
    ovah
    -0.15
    ILLA
    -0.15
    empo
    -0.15
    lez
    -0.15
    POSITIVE LOGITS
    ward
    0.34
    wards
    0.23
    yyyy
    0.22
    yyy
    0.20
    /down
    0.20
    WARD
    0.18
    /on
    0.18
    /up
    0.17
    eward
    0.17
    ÌĢ
    0.17
    Act Density 0.035%

    No Known Activations