INDEX
    Explanations

    similarity and equivalence

    New Auto-Interp
    Negative Logits
    0.34
    <0x00>
    0.33
    roga
    0.33
    ланд
    0.31
     Grenzen
    0.31
    மன்
    0.31
     anzi
    0.30
    むしろ
    0.30
    )}]{
    0.30
    thia
    0.30
    POSITIVE LOGITS
     similar
    3.25
     similarly
    3.13
    Similar
    2.89
     similaire
    2.88
    similar
    2.84
     Similar
    2.83
    同様
    2.83
    同樣
    2.75
    同样的
    2.69
     similares
    2.66
    Act Density 0.096%

    No Known Activations